前言：

Kylin依赖于Hive和Hbase，所以必须保证Hive和Hbase能够正常使用。

Hive安装教程【传送门】

Hbase安装教程【传送门】

简介：

　Apache Kylin是一个开源的分布式分析引擎，提供Hadoop之上的SQL查询接口及多维分析（OLAP）能力以支持超大规模数据，最初由eBay 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。是国人之光，由中国人自主研发的大数据组件，2014年10月进行开源，同年11月称为Apache孵化器项目，于2015年12月，Apache基金会ASF正式升级Kylin为顶级项目。其发展之快可谓迅猛，搭上了基于大数据的OLAP快车，同时也感慨开源之伟大与贡献。

开始安装：

一：下载安装包并上传到Linux系统

官方下载网址【传送门】中文就是骄傲

上传文件教程【传送门】

二：解压

tar -zxvf apache-kylin-2.6.3-bin-hbase1x.tar.gz -C ../servers/

三：配置Kylin

1、增加kylin依赖组件的配置

/export/servers/apache-kylin-2.6.3-bin-hbase1x/conf
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HBASE_HOME/conf/hbase-site.xml hbase-site.xml
ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xml
ln -s $SPARK_HOME/conf/spark-defaults.conf spark-defaults.conf

2、配置conf/kylin.properties

删除掉原本conf/kylin.properties中的内容，并覆盖以下新内容：

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#    http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The below commented values will effect as default settings
# Uncomment and override them if necessary
#### METADATA | ENV ###
## The metadata store in hbase
#kylin.metadata.url=kylin_metadata@hbase
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/apps/kylin
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
#kylin.env=QA
## kylin zk base path
kylin.env.zookeeper-base-path=/kylin
#### SERVER | WEB | RESTCLIENT ###
## Kylin server mode, valid value [all, query, job]
#kylin.server.mode=all
## List of web servers in use, this enables one web server instance to sync up with other servers.
#kylin.server.cluster-servers=localhost:7070
## Display timezone on UI,format like[GMT+N or GMT-N]
#kylin.web.timezone=
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#kylin.web.cross-domain-enabled=true
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#### PUBLIC CONFIG ###
#kylin.engine.default=2
#kylin.storage.default=2
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#### SOURCE ###
## Hive client, valid value [cli, beeline]
#kylin.source.hive.client=cli
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
#kylin.source.hive.beeline-shell=beeline
## Parameters for beeline client, only necessary if hive client is beeline
##kylin.source.hive.beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
## While hive client uses above settings to read hive table metadata,
## table operations can go through a separate SparkSQL command line, given SparkSQL connects to the same Hive metastore.
#kylin.source.hive.enable-sparksql-for-table-ops=false
#kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
#kylin.source.hive.sparksql-beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
kylin.source.hive.keep-flat-table=false
## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default
## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true
#### STORAGE ###
## The storage for final cube file in hbase
kylin.storage.url=hbase
## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_
## The namespace for hbase storage
kylin.storage.hbase.namespace=default
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none
## HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
## Leave empty if hbase running on same cluster with hive and mapreduce
##kylin.storage.hbase.cluster-fs=
## The cut size for hbase region, in GB.
#kylin.storage.hbase.region-cut-gb=5
## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
## Set 0 to disable this optimization.
#kylin.storage.hbase.hfile-size-gb=2
#kylin.storage.hbase.min-region-count=1
#kylin.storage.hbase.max-region-count=500
## Optional information for the owner of kylin platform, it can be your team's email
## Currently it will be attached to each kylin's htable attribute
#kylin.storage.hbase.owner-tag=whoami@kylin.apache.org
#kylin.storage.hbase.coprocessor-mem-gb=3
## By default kylin can spill query's intermediate results to disks when it's consuming too much memory.
## Set it to false if you want query to abort immediately in such condition.
#kylin.storage.partition.aggr-spill-enabled=true
## The maximum number of bytes each coprocessor is allowed to scan.
## To allow arbitrary large scan, you can set it to 0.
#kylin.storage.partition.max-scan-bytes=3221225472
## The default coprocessor timeout is (hbase.rpc.timeout * 0.9) / 1000 seconds,
## You can set it to a smaller value. 0 means use default.
## kylin.storage.hbase.coprocessor-timeout-seconds=0
## clean real storage after delete operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
#kylin.storage.clean-after-delete-operation=false
#### JOB ###
## Max job retry on error, default 0: no retry
#kylin.job.retry=0
## Max count of concurrent jobs running
#kylin.job.max-concurrent-jobs=10
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
## If true, will send email notification on job complete
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
#
#
#### ENGINE ###
#
## Time interval to check hadoop job status
#kylin.engine.mr.yarn-check-interval-seconds=10
#
#kylin.engine.mr.reduce-input-mb=500
#
#kylin.engine.mr.max-reducer-number=500
#
#kylin.engine.mr.mapper-input-rows=1000000
#
## Enable dictionary building in MR reducer
#kylin.engine.mr.build-dict-in-reducer=true
#
## Number of reducers for fetching UHC column distinct values
#kylin.engine.mr.uhc-reducer-count=3
#
## Whether using an additional step to build UHC dictionary
#kylin.engine.mr.build-uhc-dict-in-additional-step=false
#
#
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#
## 'auto', 'inmem', 'layer' or 'random' for testing
#kylin.cube.algorithm=layer
#
## A smaller threshold prefers layer, a larger threshold prefers in-mem
#kylin.cube.algorithm.layer-or-inmem-threshold=7
#
## auto use inmem algorithm:
## 1, cube planner optimize job
## 2, no source record
#kylin.cube.algorithm.inmem-auto-optimize=true
#
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.snapshot.max-mb=300
#
#kylin.cube.cubeplanner.enabled=true
#kylin.cube.cubeplanner.enabled-for-existing-cube=true
#kylin.cube.cubeplanner.expansion-threshold=15.0
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#
#### QUERY ###
#
## Controls the maximum number of bytes a query is allowed to scan storage.
## The default value 0 means no limit.
## The counterpart kylin.storage.partition.max-scan-bytes sets the maximum per coprocessor.
#kylin.query.max-scan-bytes=0
#
#kylin.query.cache-enabled=true
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop
#
## Estimate the RDD partition numbers
kylin.engine.spark.rdd-partition-cut-mb=10
#
## Minimal partition numbers of rdd
kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
kylin.engine.spark.max-partition=1000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.driver.memory=512M
kylin.engine.spark-conf.spark.executor.memory=1G
kylin.engine.spark-conf.spark.executor.instances=2
kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=512
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs://node01/apps/spark2/spark-history
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs://node01:8020/apps/spark2/spark-history
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#### Spark conf for specific job
kylin.engine.spark-conf-mergedict.spark.executor.memory=1G
kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
kylin.engine.spark-conf.spark.yarn.archive=hdfs://node01:8020/apps/spark2/lib/spark-libs.jar
kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
#
#
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl
#
##kylin.query.pushdown.update-enabled=false
##kylin.query.pushdown.jdbc.url=jdbc:hive2://sandbox:10000/default
##kylin.query.pushdown.jdbc.driver=org.apache.hive.jdbc.HiveDriver
##kylin.query.pushdown.jdbc.username=hive
##kylin.query.pushdown.jdbc.password=
#
##kylin.query.pushdown.jdbc.pool-max-total=8
##kylin.query.pushdown.jdbc.pool-max-idle=8
##kylin.query.pushdown.jdbc.pool-min-idle=0
#
#### JDBC Data Source
##kylin.source.jdbc.connection-url=
##kylin.source.jdbc.driver=
##kylin.source.jdbc.dialect=
##kylin.source.jdbc.user=
##kylin.source.jdbc.pass=
##kylin.source.jdbc.sqoop-home=
##kylin.source.jdbc.filed-delimiter=|

3、配置kylin.sh

/export/servers/apache-kylin-2.6.3-bin-hbase1x/bin
vim kylin.sh

kylin.sh文件最后面追加如下内容：

export HADOOP_HOME=${HADOOP_HOME}
export HIVE_HOME=${HIVE_HOME}
export HBASE_HOME=${HBASE_HOME}
export SPARK_HOME=${SPARK_HOME}

4、初始化kylin在hdfs上的数据路径

hadoop fs -mkdir -p /apps/kylin

四：启动Kylin准备工作

1、启动zookeeper

zkServer.sh start

2、启动HDFS

/export/servers/spark-2.2.0-bin-2.6.0-cdh5.14.0/sbin/start-all.sh

3、启动YARN集群

不用单独启动 启动Hadoop时就会启动Yarn

4、启动HBase集群

start-hbase.sh start

5、启动 metastore

nohup hive --service metastore &

6、启动 hiverserver2

nohup hive --service hiveserver2 &

7、启动Yarn history server

/export/servers/hadoop-2.6.0-cdh5.14.0/sbin/mr-jobhistory-daemon.sh start historyserver

五：启动Kylin

/export/servers/apache-kylin-2.6.3-bin-hbase1x/bin/kylin.sh start

看到下图就说明你成功了，你成功了

大数据开发平台：数仓组件Apache Kylin详细安装暨使用教程相关推荐

大数据开发之数仓建模
目录简介 1.什么是数据模型? 2.为什么需要数据模型? 3.如何建设数据模型? 简介每个行业都有自己的模型,但不难发现,在数据建模的方法上,它们都有着共通的基本特点. 文章主要分以下几个方面来简 ...
从 Airflow 到 Apache DolphinScheduler，有赞大数据开发平台的调度系统演进
点击上方蓝字关注我们作者 | 宋哲琦 ✎ 编者按在不久前的 Apache DolphinScheduler Meetup 2021 上,有赞大数据开发平台负责人宋哲琦带来了平台调度系统 ...
大数据项目离线数仓（全）一（数据采集平台）
搭建用户行为数据采集平台.搭建业务数据采集平台.搭建数据仓库系统.制作可视化报表本篇博客包括搭建用户行为数据采集平台.搭建业务数据采集平台搭建数据仓库系统在大数据项目离线数仓(全 )二制作可视化 ...
贝壳一站式大数据开发平台实践
分享嘉宾:仰宗强编辑整理:刘春龙出品平台:DataFunTalk 导读:本次分享嘉宾是来自贝壳大数据部门的仰宗强,详细介绍了针对贝壳的业务数据与需求的增长,逐步升级数据开发平台的探索实践过程,包括 ...
企业级一站式大数据开发平台理论及实践
点击上方蓝色字体,选择"设为星标" 回复"资源"获取更多资源前言本文是个人在从零搭建部门数据及运营平台的过程中的笔记.随着互联网规模不断的扩大,数据也在爆炸 ...
集成开发环境-大数据开发平台的门户
什么是集成开发环境这一篇,来谈一下大数据开发平台的门面,集成开发环境.什么是集成开发环境?顾名思义,就是IDE,哪个码农不知道IDE的,有胆你站出来! 不过IDE这个词也太普通了,在那些大厂玩大数据 ...
开源大数据开发平台DataSphereStudioLinkis踩坑记录
Linkis:https://github.com/WeBankFinTech/Linkis DataSphereStudio:https://github.com/WeBankFinTech/Dat ...
每日7千次的跨部门任务调度，有赞怎么设计大数据开发平台？
随着公司规模的增长,对大数据的离线应用开发的需求越来越多,这些需求包括但不限于离线数据同步(MySQL/Hive/Hbase/Elastic Search 等之间的离线同步).离线计算(Hive/Ma ...
大数据基础知识——数仓的搭建（维度建模）
数据仓库文章目录数据仓库数据仓库的介绍: 数据仓库的概念: OLTP和OLAP区别: 数据仓库的特点: 面向主题: 数据集成: 非易失: 时变: 数据仓库系统架构系统结构图源数据 ETL 数 ...

大数据开发平台：数仓组件Apache Kylin详细安装暨使用教程

前言：

简介：

开始安装：

一：下载安装包并上传到Linux系统

二：解压

三：配置Kylin

四：启动Kylin准备工作

五：启动Kylin

看到下图就说明你成功了，你成功了

大数据开发平台：数仓组件Apache Kylin详细安装暨使用教程相关推荐

最新文章

热门文章

大数据开发平台：数仓组件Apache Kylin详细安装暨使用教程

前言：

简介：

开始安装：

一：下载安装包并上传到Linux系统

二：解压

三：配置Kylin

四：启动Kylin准备工作

五：启动Kylin

看到下图就说明你成功了，你成功了 insertAdIfNeeded();

大数据开发平台：数仓组件Apache Kylin详细安装暨使用教程相关推荐

最新文章

热门文章

看到下图就说明你成功了，你成功了