1.ik 热词及近义词 远程字典的获取方式

简单看下源码,这里需要注意的
1.每次轮询校验的时候设置了请求头 “If-Modified-Since”,“If-None-Match”
2.用 “Etag”和 “Last-Modified” 来确定文件是否发生变化
3.词库有更新的时候调用了 Dictionary.getSingleton().reLoadMainDict();, reLoadMainDict里调用了 loadRemoteExtDict() 来加载远程字典 然后 getRemoteWords 和 getRemoteWordsUnprivileged 来获取词条,获取词条的请求头并没有加上面两个属性

package org.wltea.analyzer.dic;
public class Monitor implements Runnable {........./*** 监控流程:*  ①向词库服务器发送Head请求*  ②从响应中获取Last-Modify、ETags字段值,判断是否变化*  ③如果未变化,休眠1min,返回第①步*     ④如果有变化,重新加载词典*  ⑤休眠1min,返回第①步*/public void runUnprivileged() {//超时设置RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10*1000).setConnectTimeout(10*1000).setSocketTimeout(15*1000).build();HttpHead head = new HttpHead(location);head.setConfig(rc);//设置请求头if (last_modified != null) {head.setHeader("If-Modified-Since", last_modified);}if (eTags != null) {head.setHeader("If-None-Match", eTags);}CloseableHttpResponse response = null;try {response = httpclient.execute(head);//返回200 才做操作if(response.getStatusLine().getStatusCode()==200){if (((response.getLastHeader("Last-Modified")!=null) && !response.getLastHeader("Last-Modified").getValue().equalsIgnoreCase(last_modified))||((response.getLastHeader("ETag")!=null) && !response.getLastHeader("ETag").getValue().equalsIgnoreCase(eTags))) {// 远程词库有更新,需要重新加载词典,并修改last_modified,eTagsDictionary.getSingleton().reLoadMainDict();last_modified = response.getLastHeader("Last-Modified")==null?null:response.getLastHeader("Last-Modified").getValue();eTags = response.getLastHeader("ETag")==null?null:response.getLastHeader("ETag").getValue();}}else if (response.getStatusLine().getStatusCode()==304) {//没有修改,不做操作//noop}else{logger.info("remote_ext_dict {} return bad code {}" , location , response.getStatusLine().getStatusCode() );}} catch (Exception e) {logger.error("remote_ext_dict {} error!",e , location);}finally{try {if (response != null) {response.close();}} catch (IOException e) {logger.error(e.getMessage(), e);}}}
}
....
..../*** 词典管理类,单子模式*/
public class Dictionary {...
...void reLoadMainDict() {logger.info("start to reload ik dict.");// 新开一个实例加载词典,减少加载过程对当前词典使用的影响Dictionary tmpDict = new Dictionary(configuration);tmpDict.configuration = getSingleton().configuration;tmpDict.loadMainDict();tmpDict.loadStopWordDict();_MainDict = tmpDict._MainDict;_StopWords = tmpDict._StopWords;logger.info("reload ik dict finished.");}/*** 加载主词典及扩展词典*/private void loadMainDict() {// 建立一个主词典实例_MainDict = new DictSegment((char) 0);// 读取主词典文件Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_MAIN);loadDictFile(_MainDict, file, false, "Main Dict");// 加载扩展词典this.loadExtDict();// 加载远程自定义词库this.loadRemoteExtDict();}/*** 加载远程扩展词典到主词库表*/private void loadRemoteExtDict() {List<String> remoteExtDictFiles = getRemoteExtDictionarys();for (String location : remoteExtDictFiles) {logger.info("[Dict Loading] " + location);List<String> lists = getRemoteWords(location);// 如果找不到扩展的字典,则忽略if (lists == null) {logger.error("[Dict Loading] " + location + " load failed");continue;}for (String theWord : lists) {if (theWord != null && !"".equals(theWord.trim())) {// 加载扩展词典数据到主内存词典中logger.info(theWord);_MainDict.fillSegment(theWord.trim().toLowerCase().toCharArray());}}}}private static List<String> getRemoteWords(String location) {SpecialPermission.check();return AccessController.doPrivileged((PrivilegedAction<List<String>>) () -> {return getRemoteWordsUnprivileged(location);});}}/*** 从远程服务器上下载自定义词条*/private static List<String> getRemoteWordsUnprivileged(String location) {List<String> buffer = new ArrayList<String>();RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10 * 1000).setConnectTimeout(10 * 1000).setSocketTimeout(60 * 1000).build();CloseableHttpClient httpclient = HttpClients.createDefault();CloseableHttpResponse response;BufferedReader in;HttpGet get = new HttpGet(location);get.setConfig(rc);try {response = httpclient.execute(get);if (response.getStatusLine().getStatusCode() == 200) {String charset = "UTF-8";// 获取编码,默认为utf-8HttpEntity entity = response.getEntity();if(entity!=null){Header contentType = entity.getContentType();if(contentType!=null&&contentType.getValue()!=null){String typeValue = contentType.getValue();if(typeValue!=null&&typeValue.contains("charset=")){charset = typeValue.substring(typeValue.lastIndexOf("=") + 1);}}if (entity.getContentLength() > 0 || entity.isChunked()) {in = new BufferedReader(new InputStreamReader(entity.getContent(), charset));String line;while ((line = in.readLine()) != null) {buffer.add(line);}in.close();response.close();return buffer;}}}response.close();} catch (IllegalStateException | IOException e) {logger.error("getRemoteWords {} error", e, location);}return buffer;}......

近义词也是差不多的就不多看了,简单贴一点
git 地址 https://github.com/bells/elasticsearch-analysis-dynamic-synonym
不过 这里需要配一下analysis

"analysis": {"analyzer": {"my_ik_max_word": {"tokenizer": "ik_max_word","filter": ["remote_synonym"]}},"filter": {"remote_synonym": {"type": "dynamic_synonym","synonyms_path": "http://xxxx/${type}/remote_dic.txt","interval": 30}},
}
public class Monitor implements Runnable {private SynonymFile synonymFile;Monitor(SynonymFile synonymFile) {this.synonymFile = synonymFile;}@Overridepublic void run() {if (synonymFile.isNeedReloadSynonymMap()) {synonymMap = synonymFile.reloadSynonymMap();for (AbsSynonymFilter dynamicSynonymFilter : dynamicSynonymFilters.keySet()) {dynamicSynonymFilter.update(synonymMap);logger.debug("success reload synonym");}}}}public class RemoteSynonymFile implements SynonymFile {......@Overridepublic boolean isNeedReloadSynonymMap() {RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10 * 1000).setConnectTimeout(10 * 1000).setSocketTimeout(15 * 1000).build();HttpHead head = AccessController.doPrivileged((PrivilegedAction<HttpHead>) () -> new HttpHead(location));head.setConfig(rc);// 设置请求头if (lastModified != null) {head.setHeader("If-Modified-Since", lastModified);}if (eTags != null) {head.setHeader("If-None-Match", eTags);}CloseableHttpResponse response = null;try {response = executeHttpRequest(head);if (response.getStatusLine().getStatusCode() == 200) { // 返回200 才做操作if (!response.getLastHeader(LAST_MODIFIED_HEADER).getValue().equalsIgnoreCase(lastModified)|| !response.getLastHeader(ETAG_HEADER).getValue().equalsIgnoreCase(eTags)) {lastModified = response.getLastHeader(LAST_MODIFIED_HEADER) == null ? null: response.getLastHeader(LAST_MODIFIED_HEADER).getValue();eTags = response.getLastHeader(ETAG_HEADER) == null ? null: response.getLastHeader(ETAG_HEADER).getValue();return true;}} else if (response.getStatusLine().getStatusCode() == 304) {return false;} else {logger.info("remote synonym {} return bad code {}", location,response.getStatusLine().getStatusCode());}} finally {try {if (response != null) {response.close();}} catch (IOException e) {logger.error("failed to close http response", e);}}return false;}....../*** Download custom terms from a remote server*/public Reader getReader() {Reader reader;RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10 * 1000).setConnectTimeout(10 * 1000).setSocketTimeout(60 * 1000).build();CloseableHttpResponse response = null;BufferedReader br = null;HttpGet get = new HttpGet(location);get.setConfig(rc);try {response = executeHttpRequest(get);if (response.getStatusLine().getStatusCode() == 200) {String charset = "UTF-8"; // 获取编码,默认为utf-8if (response.getEntity().getContentType().getValue().contains("charset=")) {String contentType = response.getEntity().getContentType().getValue();charset = contentType.substring(contentType.lastIndexOf('=') + 1);}br = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), charset));StringBuilder sb = new StringBuilder();String line;while ((line = br.readLine()) != null) {logger.debug("reload remote synonym: {}", line);sb.append(line).append(System.getProperty("line.separator"));}reader = new StringReader(sb.toString());} else reader = new StringReader("");} catch (Exception e) {logger.error("get remote synonym reader {} error!", location, e);
//            throw new IllegalArgumentException(
//                    "Exception while reading remote synonyms file", e);// Fix #54 Returns blank if synonym file has be deleted.reader = new StringReader("");} finally {try {if (br != null) {br.close();}} catch (IOException e) {logger.error("failed to close bufferedReader", e);}try {if (response != null) {response.close();}} catch (IOException e) {logger.error("failed to close http response", e);}}return reader;}}

2.实现

思路挺简单的
就是存一个最后修改时间
最后修改时间变了 证明新增了
存一个重构分词时间,
最后修改时间大于重构分词时间 ,就需要重构下分词

/**词条类**/
@Data
@TableName("ext_dict")
public class ExtDict extends BaseEntity {/*** id*/@TableId(value = "id", type = IdType.AUTO)private Integer id;/*** 扩展词*/@NotNull(message = "热词不能为空", groups = { AddGroup.class})private String word;/*** 类型 0 热词 1近义词 2.停用词*/private Integer type;/*** 近义词*/private String synonym;}/** ctroller类**//*** 获取远程字典* @param type 字典类型, 0 热词 1 近义词 2 禁用词* @param request* @param response*/@GetMapping("/{type}/remote_dic.txt")public void  getRemotDic(@PathVariable("type") int type,HttpServletRequest request,HttpServletResponse response) {ArrayList<String> headerNames = Collections.list(request.getHeaderNames());response.setContentType("text/plain");response.setCharacterEncoding("utf-8");String lastModified = RedisUtils.getCacheObject(REMOTE_DIC_LAST_MODIFY+ type +":").toString();if(StringUtils.isEmpty(lastModified)){lastModified=extDictService.queryLastModified(type)+"";RedisUtils.setCacheObject(REMOTE_DIC_LAST_MODIFY+type+":",lastModified);}response.setHeader("ETag","xxxxxxxxxxxxxxxxdsa");response.setDateHeader("Last-Modified",Long.valueOf(lastModified));//es轮询校验请求  无需返回字典数据if(headerNames.contains("If-None-Match") || headerNames.contains("If-Modified-Since")){return;}//非es校验请求 返回正常内容List<ExtDict> list = extDictService.queryListByType(type);PrintWriter writer=null;try {writer = response.getWriter();if (type==1){list = list.stream().map(exdict -> {exdict.setWord(exdict.getWord() + "=>" + exdict.getSynonym());return exdict;}).collect(Collectors.toList());}for (int i = 0; i < list.size(); i++) {writer.write(list.get(i).getWord()+"\n");}writer.flush();} catch (IOException e) {e.printStackTrace();}finally {if (writer!=null){writer.close();}}String status = RedisUtils.getCacheObject(REBUILD_ANALYSIS_STATUS);String time = RedisUtils.getCacheObject(REBUILD_ANALYSIS_TIME);//status 无值 进行初始化操作if(status==null){RedisUtils.setCacheObject(REBUILD_ANALYSIS_STATUS,ANALYSIS_STATUS_SUCCESS);RedisUtils.setCacheObject(REBUILD_ANALYSIS_TIME,System.currentTimeMillis()+"");return;}//进行重建分词if(!ANALYSIS_STATUS_UPDATING.equals(status) && StringUtils.compare(lastModified,time)>0){elasticSearchService.rebuildAnalysis();}}/**Service*//*** 重建分词* 用于远程扩展词典更新*/@Overridepublic void rebuildAnalysis() {UpdateByQueryRequest request =new UpdateByQueryRequest(DEFULT_INDEX_NAME);request.setConflicts("proceed");request.setQuery(QueryBuilders.matchAllQuery());request.setRefresh(true);restHighLevelClient.updateByQueryAsync(request, RequestOptions.DEFAULT, new                ActionListener<BulkByScrollResponse>() {@Overridepublic void onResponse(BulkByScrollResponse bulkByScrollResponse) {log.info("------------------重建分词成功");RedisUtils.setCacheObject(REBUILD_ANALYSIS_STATUS,ANALYSIS_STATUS_SUCCESS);RedisUtils.setCacheObject(REBUILD_ANALYSIS_TIME,System.currentTimeMillis()+"");}@Overridepublic void onFailure(Exception e) {log.error("----------- -----重建分词失败",e);RedisUtils.setCacheObject(REBUILD_ANALYSIS_STATUS,ANALYSIS_STATUS_FAILED);}});RedisUtils.setCacheObject(REBUILD_ANALYSIS_STATUS,ANALYSIS_STATUS_UPDATING);}

es基于数据库的远程字典热更新相关推荐

  1. Elasticsearch7.15.2 修改IK分词器源码实现基于MySql8的词库热更新

    文章目录 一.源码分析 1. 默认热更新 2. 热更新分析 3. 方法分析 二.词库热更新 2.1. 导入依赖 2.2. 数据库 2.3. JDBC 配置 2.4. 打包配置 2.5. 权限策略 2. ...

  2. springboot基于Elasticsearch6.x版本进行ES同义词、停用词(停止词)插件配置,远程词典热加载及数据库词典热加载总结,es停用词热更新,es同义词热更新

    前言:ES版本差异较大,建议跨版本的同学,可以先了解一下版本区别,建议不要跨版本使用插件或者进行项目调试. 本总结主要基于6.x版本的6.5.1(6.2.2实测可用),分词器为IK,下载地址:http ...

  3. es 修改ik和同义词插件源码连接mysql实现字典值同义词热更新

    问题描述: 上周运营反馈商城搜索词搜不到 排查发现es ik分词器的ik_smart对搜索词的分词结果不是ik_max_word对索引文档字段值分词结果的子集 即细粒度分词结果不完全包含粗粒度分词结果 ...

  4. 【Unity】 HTFramework框架(十九)ILHotfix热更新模块

    更新日期:2019年9月27日. Github源码:[点我获取源码] Gitee源码:[点我获取源码] 索引 ILHotfix热更新模块简介 使用ILHotfix热更新 创建ILHotfix环境 创建 ...

  5. 一文搞定Java热更新

    Java热更新 在持续交付的时代,重新部署一个新的版本只需要点击一下按钮.但在有的情况下,重新部署过程可能比较复杂,停机是不被允许的.所以JVM提供了另外一种选择:在不重启应用的前提下进行小幅改动,又 ...

  6. es ik分词热更新MySQL,ElasticSearch(25)- 改IK分词器源码来基于mysql热更新词库

    代码地址 已经修改过的支持定期从数据库中提取新词库,来实现热更新.代码: https://github.com/csy512889371/learndemo/tree/master/elasticse ...

  7. ES 词库热更新(实现方式一:远程词库)

    实现词库热更新两种方式 1.远程词库,2.修改源码整合数据库 要求:(前提安装ik词库.下载复制到plugin,重启即可) 1.改http请求需要返回两个头部header(Last-Modified. ...

  8. springboot整合elasticsearch及热更新字典及同义词

    整合 dao层ItemRepository package com.futhead.es.dao;import com.futhead.es.model.Item; import org.spring ...

  9. bat 取得服务列表_基于IDEA热部署更新服务器Tomcat类,服务器Tomcat热更新

    前言 在开发过程中,如果我们是使用的IDEA,就会知道IDEA有一个热更新的功能,何为热更新?就是在不重启Tomcat的情况下让服务器中的代码变更为最新的.这样既能快速的更新代码,又不用担心Tomca ...

最新文章

  1. python保存到固定文件夹的存储路径不能直接复制!
  2. 田志刚北京大学CIO(信息总监)班讲知识管理
  3. STM32开发 -- 进制与字符串间的转换
  4. 大数据产品的备份及恢复
  5. ExtJS+DWR+Spring+Hibernate开发HRMS(3)
  6. vector 使用 c++11 Lambda 表达式 排序
  7. C#语法糖yield
  8. MySQL的表类型和存储引擎
  9. HDU2025 查找最大元素【入门】
  10. hls和modelsim联合仿真生成波形找不到wlf文件
  11. jQuery入门[2]-选择器
  12. 安卓问题报告小记(四):Some projects cannot be imported because they already exist in the workspace...
  13. tcp 抓包出现spurious retransmission
  14. 计算机病毒是谁做的,第一个制造电脑病毒的人是谁?
  15. 怎样在python的turtle中输入文字_Python在图片中添加文字的两种方法
  16. 2021Unity教程:Unity官方中文版免费下载方法(黑皮肤可选)无需破解!
  17. Vue3生命周期及事件写法
  18. spring MVC从零开始
  19. Week11——C密文
  20. (五)、马尔科夫预测模型

热门文章

  1. mysql replicate-rewrite-db,mysql 主从复制 replicate-rewrite-db 无效
  2. android实现一段文字中不同颜色
  3. SpringCloud 应用在 Kubernetes 上的最佳实践 —— 高可用(弹性伸缩)
  4. Rigid Manipulators--Modelling建模--Kinematics运动学
  5. 4k电脑屏幕VMware虚拟机视面的调大
  6. 如何解决生鲜行业企业商品多,价格波动快的难题
  7. python 3.8.0安卓_Python3.8.0(32/64位)官方正式版_Python下载-PC9软件园
  8. 整理一些写的好的设计模式的博客
  9. 安防领域对视频图像处理技术的特殊要求
  10. 【转】朴素贝叶斯分类器的应用