前言:

Kylin依赖于Hive和Hbase,所以必须保证Hive和Hbase能够正常使用。

Hive安装教程【传送门】

Hbase安装教程【传送门】

简介:

 Apache Kylin是一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。是国人之光,由中国人自主研发的大数据组件,2014年10月进行开源,同年11月称为Apache孵化器项目,于2015年12月,Apache基金会ASF正式升级Kylin为顶级项目。其发展之快可谓迅猛,搭上了基于大数据的OLAP快车,同时也感慨开源之伟大与贡献。

开始安装:

一:下载安装包并上传到Linux系统

官方下载网址【传送门】中文就是骄傲

上传文件教程【传送门】

二:解压

tar -zxvf apache-kylin-2.6.3-bin-hbase1x.tar.gz -C ../servers/

三:配置Kylin

1、增加kylin依赖组件的配置

/export/servers/apache-kylin-2.6.3-bin-hbase1x/conf
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HBASE_HOME/conf/hbase-site.xml hbase-site.xml
ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xml
ln -s $SPARK_HOME/conf/spark-defaults.conf spark-defaults.conf

2、配置conf/kylin.properties

删除掉原本conf/kylin.properties中的内容,并覆盖以下新内容:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#    http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The below commented values will effect as default settings
# Uncomment and override them if necessary
#### METADATA | ENV ###
## The metadata store in hbase
#kylin.metadata.url=kylin_metadata@hbase
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/apps/kylin
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
#kylin.env=QA
## kylin zk base path
kylin.env.zookeeper-base-path=/kylin
#### SERVER | WEB | RESTCLIENT ###
## Kylin server mode, valid value [all, query, job]
#kylin.server.mode=all
## List of web servers in use, this enables one web server instance to sync up with other servers.
#kylin.server.cluster-servers=localhost:7070
## Display timezone on UI,format like[GMT+N or GMT-N]
#kylin.web.timezone=
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#kylin.web.cross-domain-enabled=true
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#### PUBLIC CONFIG ###
#kylin.engine.default=2
#kylin.storage.default=2
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#### SOURCE ###
## Hive client, valid value [cli, beeline]
#kylin.source.hive.client=cli
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
#kylin.source.hive.beeline-shell=beeline
## Parameters for beeline client, only necessary if hive client is beeline
##kylin.source.hive.beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
## While hive client uses above settings to read hive table metadata,
## table operations can go through a separate SparkSQL command line, given SparkSQL connects to the same Hive metastore.
#kylin.source.hive.enable-sparksql-for-table-ops=false
#kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
#kylin.source.hive.sparksql-beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
kylin.source.hive.keep-flat-table=false
## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default
## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true
#### STORAGE ###
## The storage for final cube file in hbase
kylin.storage.url=hbase
## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_
## The namespace for hbase storage
kylin.storage.hbase.namespace=default
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none
## HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
## Leave empty if hbase running on same cluster with hive and mapreduce
##kylin.storage.hbase.cluster-fs=
## The cut size for hbase region, in GB.
#kylin.storage.hbase.region-cut-gb=5
## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
## Set 0 to disable this optimization.
#kylin.storage.hbase.hfile-size-gb=2
#kylin.storage.hbase.min-region-count=1
#kylin.storage.hbase.max-region-count=500
## Optional information for the owner of kylin platform, it can be your team's email
## Currently it will be attached to each kylin's htable attribute
#kylin.storage.hbase.owner-tag=whoami@kylin.apache.org
#kylin.storage.hbase.coprocessor-mem-gb=3
## By default kylin can spill query's intermediate results to disks when it's consuming too much memory.
## Set it to false if you want query to abort immediately in such condition.
#kylin.storage.partition.aggr-spill-enabled=true
## The maximum number of bytes each coprocessor is allowed to scan.
## To allow arbitrary large scan, you can set it to 0.
#kylin.storage.partition.max-scan-bytes=3221225472
## The default coprocessor timeout is (hbase.rpc.timeout * 0.9) / 1000 seconds,
## You can set it to a smaller value. 0 means use default.
## kylin.storage.hbase.coprocessor-timeout-seconds=0
## clean real storage after delete operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
#kylin.storage.clean-after-delete-operation=false
#### JOB ###
## Max job retry on error, default 0: no retry
#kylin.job.retry=0
## Max count of concurrent jobs running
#kylin.job.max-concurrent-jobs=10
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
## If true, will send email notification on job complete
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
#
#
#### ENGINE ###
#
## Time interval to check hadoop job status
#kylin.engine.mr.yarn-check-interval-seconds=10
#
#kylin.engine.mr.reduce-input-mb=500
#
#kylin.engine.mr.max-reducer-number=500
#
#kylin.engine.mr.mapper-input-rows=1000000
#
## Enable dictionary building in MR reducer
#kylin.engine.mr.build-dict-in-reducer=true
#
## Number of reducers for fetching UHC column distinct values
#kylin.engine.mr.uhc-reducer-count=3
#
## Whether using an additional step to build UHC dictionary
#kylin.engine.mr.build-uhc-dict-in-additional-step=false
#
#
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#
## 'auto', 'inmem', 'layer' or 'random' for testing
#kylin.cube.algorithm=layer
#
## A smaller threshold prefers layer, a larger threshold prefers in-mem
#kylin.cube.algorithm.layer-or-inmem-threshold=7
#
## auto use inmem algorithm:
## 1, cube planner optimize job
## 2, no source record
#kylin.cube.algorithm.inmem-auto-optimize=true
#
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.snapshot.max-mb=300
#
#kylin.cube.cubeplanner.enabled=true
#kylin.cube.cubeplanner.enabled-for-existing-cube=true
#kylin.cube.cubeplanner.expansion-threshold=15.0
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#
#### QUERY ###
#
## Controls the maximum number of bytes a query is allowed to scan storage.
## The default value 0 means no limit.
## The counterpart kylin.storage.partition.max-scan-bytes sets the maximum per coprocessor.
#kylin.query.max-scan-bytes=0
#
#kylin.query.cache-enabled=true
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop
#
## Estimate the RDD partition numbers
kylin.engine.spark.rdd-partition-cut-mb=10
#
## Minimal partition numbers of rdd
kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
kylin.engine.spark.max-partition=1000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.driver.memory=512M
kylin.engine.spark-conf.spark.executor.memory=1G
kylin.engine.spark-conf.spark.executor.instances=2
kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=512
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs://node01/apps/spark2/spark-history
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs://node01:8020/apps/spark2/spark-history
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#### Spark conf for specific job
kylin.engine.spark-conf-mergedict.spark.executor.memory=1G
kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
kylin.engine.spark-conf.spark.yarn.archive=hdfs://node01:8020/apps/spark2/lib/spark-libs.jar
kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
#
#
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl
#
##kylin.query.pushdown.update-enabled=false
##kylin.query.pushdown.jdbc.url=jdbc:hive2://sandbox:10000/default
##kylin.query.pushdown.jdbc.driver=org.apache.hive.jdbc.HiveDriver
##kylin.query.pushdown.jdbc.username=hive
##kylin.query.pushdown.jdbc.password=
#
##kylin.query.pushdown.jdbc.pool-max-total=8
##kylin.query.pushdown.jdbc.pool-max-idle=8
##kylin.query.pushdown.jdbc.pool-min-idle=0
#
#### JDBC Data Source
##kylin.source.jdbc.connection-url=
##kylin.source.jdbc.driver=
##kylin.source.jdbc.dialect=
##kylin.source.jdbc.user=
##kylin.source.jdbc.pass=
##kylin.source.jdbc.sqoop-home=
##kylin.source.jdbc.filed-delimiter=|

3、配置kylin.sh

/export/servers/apache-kylin-2.6.3-bin-hbase1x/bin
vim kylin.sh

kylin.sh文件最后面追加如下内容:

export HADOOP_HOME=${HADOOP_HOME}
export HIVE_HOME=${HIVE_HOME}
export HBASE_HOME=${HBASE_HOME}
export SPARK_HOME=${SPARK_HOME}

4、初始化kylin在hdfs上的数据路径

hadoop fs -mkdir -p /apps/kylin

四:启动Kylin准备工作

1、启动zookeeper

zkServer.sh start

2、启动HDFS

/export/servers/spark-2.2.0-bin-2.6.0-cdh5.14.0/sbin/start-all.sh 

3、启动YARN集群

不用单独启动 启动Hadoop时就会启动Yarn

4、启动HBase集群

start-hbase.sh start

5、启动 metastore

nohup hive --service metastore &

6、启动 hiverserver2

nohup hive --service hiveserver2 &

7、启动Yarn history server

/export/servers/hadoop-2.6.0-cdh5.14.0/sbin/mr-jobhistory-daemon.sh start historyserver

五:启动Kylin

/export/servers/apache-kylin-2.6.3-bin-hbase1x/bin/kylin.sh start

看到下图就说明你成功了,你成功了

大数据开发平台:数仓组件Apache Kylin详细安装暨使用教程相关推荐

  1. 大数据开发之数仓建模

    目录 简介 1.什么是数据模型? 2.为什么需要数据模型? 3.如何建设数据模型? 简介 每个行业都有自己的模型,但不难发现,在数据建模的方法上,它们都有着共通的基本特点. 文章主要分以下几个方面来简 ...

  2. 从 Airflow 到 Apache DolphinScheduler,有赞大数据开发平台的调度系统演进

    点击上方 蓝字关注我们 作者 | 宋哲琦 ✎ 编 者 按 在不久前的 Apache  DolphinScheduler Meetup 2021 上,有赞大数据开发平台负责人 宋哲琦 带来了平台调度系统 ...

  3. 大数据项目离线数仓(全 )一(数据采集平台)

    搭建用户行为数据采集平台.搭建业务数据采集平台.搭建数据仓库系统.制作可视化报表 本篇博客包括搭建用户行为数据采集平台.搭建业务数据采集平台 搭建数据仓库系统在大数据项目离线数仓(全 )二 制作可视化 ...

  4. 贝壳一站式大数据开发平台实践

    分享嘉宾:仰宗强 编辑整理:刘春龙 出品平台:DataFunTalk 导读:本次分享嘉宾是来自贝壳大数据部门的仰宗强,详细介绍了针对贝壳的业务数据与需求的增长,逐步升级数据开发平台的探索实践过程,包括 ...

  5. 企业级一站式大数据开发平台理论及实践

    点击上方蓝色字体,选择"设为星标" 回复"资源"获取更多资源 前言 本文是个人在从零搭建部门数据及运营平台的过程中的笔记.随着互联网规模不断的扩大,数据也在爆炸 ...

  6. 集成开发环境-大数据开发平台的门户

    什么是集成开发环境 这一篇,来谈一下大数据开发平台的门面,集成开发环境.什么是集成开发环境?顾名思义,就是IDE,哪个码农不知道IDE的,有胆你站出来! 不过IDE这个词也太普通了,在那些大厂玩大数据 ...

  7. 开源大数据开发平台DataSphereStudioLinkis踩坑记录

    Linkis:https://github.com/WeBankFinTech/Linkis DataSphereStudio:https://github.com/WeBankFinTech/Dat ...

  8. 每日7千次的跨部门任务调度,有赞怎么设计大数据开发平台?

    随着公司规模的增长,对大数据的离线应用开发的需求越来越多,这些需求包括但不限于离线数据同步(MySQL/Hive/Hbase/Elastic Search 等之间的离线同步).离线计算(Hive/Ma ...

  9. 大数据基础知识——数仓的搭建(维度建模)

    数据仓库 文章目录 数据仓库 数据仓库的介绍: 数据仓库的概念: OLTP和OLAP区别: 数据仓库的特点: 面向主题: 数据集成: 非易失: 时变: 数据仓库系统架构 系统结构图 源数据 ETL 数 ...

最新文章

  1. python3 urllib 类
  2. 【mybatis基础】mybatis开发dao两种方法
  3. lcd timing 先关参数
  4. 轨道病害视觉检测:背景、方法与趋势
  5. UI实用素材| 工作管理、日程日历 专辑,总有一款符合设计师风格
  6. python selenium实现百度搜索
  7. jquery可见性过滤选择器
  8. 为什么在WSE配置中不要选择Establish Secure Session
  9. sqlserver 查找某个字符在字符串中第N次出现的位置
  10. 算法:移除最外层的括号1021. Remove Outermost Parentheses
  11. 思维转换感悟与区块链视频资料分享
  12. python匹配邮箱_在Python中使用正则表达式同时匹配邮箱和电话并进行简单的分类...
  13. Android MP3播放器MediaPlayer
  14. 基本过滤工具之配置前缀列表
  15. 058 不定积分计算工具总结
  16. 为什么我的微信小程序开发工具调试窗口一片空白?
  17. 中文文案排版风格指南
  18. MyBatisPlus 开启事务并交由 Springboot 管理
  19. 阿里支付系统就该这么设计(万能通用),稳的一批!
  20. android 将文字转换为拼音格式,android中将汉字转为拼音

热门文章

  1. 世界各国用电电压介绍
  2. 手机地图渲染的方式讨论
  3. jdbc(跟着宝哥学java:jdbc) jdbc概念,铁打步骤,jdbc封装,预编译对象,sql攻击
  4. Python 全栈 400 之Pandas数据分析练习
  5. 网络安全进阶篇(十一章-6)APP渗透测试篇(中)
  6. ipc备案和ipc证的关系
  7. 15-02 身份安全
  8. 前端VSCode常用插件安装和使用
  9. RACE数据集上相关的研究
  10. ACM 动态规划(简称dp) 分类