127.Lily HBase Indexer的使用

127.1 流程图

127.2 Solr中建立collection

Solr collection的schema文件建立

[root@ip-xxx-xx-x-xxx conf]# cat schema.xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="content" type="text_ch" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<types><fieldType name="string" class="solr.StrField" sortMissingLast="true"/><fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/><fieldType name="text_ch" class="solr.TextField" positionIncrementGap="100">  <analyzer type="index">  <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>  <filter class="solr.SmartChineseWordTokenFilterFactory"/>  </analyzer>  </fieldType>
</types>
</schema>

https://repository.cloudera.com/artifactory/cdh-releases-rcs/org/apache/lucene/lucene-analyzers-smartcn/4.10.3-cdh5.14.2/

collection的脚本

ZK="ip-xxx-xx-x-xxx.ap-southeast-1.compute.internal"
COLLECTION="collection1"
BASE=`pwd`
SHARD=3
REPLICA=1
echo "create solr collection"
rm -rf tmp/*
solrctl --zk $ZK:2181/solr instancedir --generate tmp/${COLLECTION}_configs
cp conf/schema.xml tmp/${COLLECTION}_configs/conf/
solrctl --zk $ZK:2181/solr instancedir --create $COLLECTION tmp/${COLLECTION}_configs
solrctl --zk $ZK:2181/solr collection --create $COLLECTION -s $SHARD -r $REPLICA
solrctl --zk $ZK:2181/solr collection --list
# ZK:Zookeeper的某台机器的hostname
# COLLECTION:需要建立的collection名字
# SHARD：需要建立的shard的数量
# REPLICA：副本数

执行

[root@ip-xxx-xx-x-xxx solr-hbase]# sh create.sh
create solr collection
Uploading configs from tmp/collection1_configs/conf to ip-172-31-5-171.ap-southeast-1.compute.internal:2181/solr. This may take up to a minute.
collection1 (2)

127.3 Morphline与Lily Indexer文件

Morphline

morphlines : [{id : morphline1 importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]commands : [{extractHBaseCells {mappings : [{inputColumn : "textinfo:content"outputField : "content"type : "string"source : value}]}}]}
]

Lily Indexer

<?xml version="1.0"?>
<indexer table="TextHbase" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper" mapping-type="row" ><!-- The relative or absolute path on the local file system to the morphline configuration file. --><!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager --><param name="morphlineFile" value="/root/solr-hbase/conf/morphlines.conf"/><!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf --><!-- <param name="morphlineId" value="morphline1"/> -->
</indexer>

###127.4 全文索引批量建立

下载中文分词的jar包

https://repository.cloudera.com/artifactory/cdh-releases-rcs/org/apache/lucene/lucene-analyzers-smartcn/4.10.3-cdh5.14.2/

[root@ip-xxx-xx-x-xxx solr-hdfs]# cp lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn
[root@ip-xxx-xx-x-xxx solr-hdfs]# cp lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar /opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB-INF/lib

分发到集群

[root@ip-xxx-xx-x-xxx shell]# sh bk_cp.sh node.list  /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar  /opt/cloudera/parcels/CDH/lib/hadoop-yarn
lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar
[root@ip-xxx-xx-x-xxx shell]# sh bk_cp.sh node.list /opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB-INF/lib/lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar /opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB-INF/lib

索引脚本

COLLECTION='collection1'
ZK='ip-172-31-5-38.ap-southeast-1.compute.internal'
echo 'Delete previous docs...'
solrctl collection --deletedocs $COLLECTION
echo 'Lily HBase MapReduce indexing...'
config="/etc/hadoop/conf.cloudera.yarn"
parcel="/opt/cloudera/parcels/CDH"
jar="$parcel/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar"
hbase_conf="/etc/hbase/conf/hbase-site.xml"
opts="'mapred.child.java.opts=-Xmx1024m'"
log4j="$parcel/share/doc/search*/examples/solr-nrt/log4j.properties"
zk="$ZK:2181/solr"
libjars="lib/lucene-analyzers-smartcn-4.10.3-cdh5.14.2.jar"
export HADOOP_OPTS="-Djava.security.auth.login.config=conf/jaas.conf"
hadoop --config $config jar $jar --conf $hbase_conf --libjars $libjars -D $opts --log4j $log4j --hbase-indexer-file conf/indexer-config.xml --verbose --go-live --zk-host $zk --collection $COLLECTION

运行

[root@ip-xxx-xx-x-xxx solr-hbase]# sh batch.sh
Delete previous docs...
Lily HBase MapReduce indexing...
0    [main] INFO  org.apache.solr.common.cloud.SolrZkClient  - Using default ZkCredentialsProvider
21   [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - Waiting for client to connect to ZooKeeper
25   [main-SendThread(ip-172-31-5-38.ap-southeast-1.compute.internal:2181)] WARN  org.apache.zookeeper.ClientCnxn  - SASL configuration failed: javax.security.auth.login.LoginException: Zookeeper client cannot authenticate using the 'Client' section of the supplied JAAS configuration: 'conf/jaas.conf' because of a RuntimeException: java.lang.SecurityException: java.io.IOException: conf/jaas.conf (No such file or directory) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.

在Solr和Hue界面中查询

大数据视频推荐：
CSDN
人工智能算法竞赛实战
AIops智能运维机器学习算法实战
ELK7 stack开发运维实战
PySpark机器学习从入门到精通
AIOps智能运维实战
大数据语音推荐：
ELK7 stack开发运维
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能：深度学习入门到精通

127.Lily HBase Indexer的使用相关推荐

Key-Value Store Indexer(Lily HBase Indexer) 小型采坑
环境: Cloudera Express 5.12.1 JDK 1.8.0_92 CentOS 7 步骤1:数据导入到Hbase中(非正题,跳过) hbase中表为allDoc,两个Family:fu ...
用Lily Hbase indexer 工具包同步Hbase的索引到solr出错
用Lily Hbase indexer 工具包同步Hbase的索引到solr出错错误堆栈 2019-05-21 06:52:07,181 ERROR [IPC Server handler 8 on ...
Lily HBase Indexer使用整理
关于Key-Value Indexer组件 CDH5.3.2中的Key-Value Indexer使用的是Lily HBase NRT Indexer服务,Lily HBase Indexer是一款灵 ...
Lily HBase Indexer在CDH中的基本使用
1. 简介 CDH上的Key-Value Store Indexer服务使用的是Lily HBase Indexer.Lily HBase Indexer是一款灵活的.可扩展的.高容错的,并且近实时的 ...
使用Lily HBase Indexer
第一步,打开solr的cloud mode. cd $SOLR_HOME/example java -Dbootstrap_confdir=./solr/collection1/conf -Dcoll ...
【Solr - HBase二级索引 —— Lily HBase Indexer】
开门见山,直接上图,这图瞧着熟悉吧~哈哈哈 Reference https://blog.csdn.net/cafebar123/article/details/79405029
CDH 6 安装 Hbase 二级索引 Solr + Key-Value Store Indexer
目录一.集群安装Solr + Key-Value Store Indexer 二.创建Hbase二级索引 1.更改表结构,允许复制 2.创建相应的SolrCloud集合 3.创建 collecti ...
Hbase二级索引+CDH+Lily
1.更改表结构,允许复制已存在的表 disable 'tableName' alter 'tableName',{NAME =>'fn', REPLICATION_SCOPE =>1} ...
阿里云HBase增强版全文索引功能技术解析
新用户9.9元即可使用6个月云数据库HBase,更有低至1元包年的入门规格供广大HBase爱好者学习研究,更多内容请参考链接阿里云HBase增强版(Lindorm)简介阿里云数据库HBase增强版 ...

127.Lily HBase Indexer的使用

127.1 流程图

127.2 Solr中建立collection

127.3 Morphline与Lily Indexer文件

127.Lily HBase Indexer的使用相关推荐

最新文章

热门文章