文章目录

  • 1、下载安装,只下载elasticSearch、Kibana即可
    • 插件安装
    • 定义文本抽取管道
  • 2、SpringBoot整合ElasticSearch
    • application.yml
    • 实体类
    • 接口类
  • 测试
    • 创建索引
    • 上传文档
    • 搜索

1、下载安装,只下载elasticSearch、Kibana即可

  • 下载安装参考Springboot/Springcloud整合ELK平台,(Filebeat方式)日志采集及管理(Elasticsearch+Logstash+Filebeat+Kibana)
  • elastic中文社区 下载地址

这里我使用7.6.2的elasticsearch版本, 因为项目使用的springboot2.3.x,避免低版本客户端,高版本索引库·,这里我先退回使用低版本索引库

插件安装

  • ik 分词器

  • ingest-attachment 这里将链接修改为自己的版本即可

插件下载完成之后,将压缩包解压到 elasticsearch的plugins目录, 之后重启elasticsearch

定义文本抽取管道

PUT /_ingest/pipeline/attachment
{"description" : "Extract attachment information","processors":[{"attachment":{"field":"data","indexed_chars" : -1,"ignore_missing":true}},{"remove":{"field":"data"}}]}

2、SpringBoot整合ElasticSearch

<dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId></dependency><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.58</version></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><version>1.18.20</version></dependency>
</dependencies>

application.yml

server:port: 9090
spring:application:name: elasticsearch-serviceelasticsearch:rest:uris: http://127.0.0.1:9200

实体类

package top.fate.entity;import lombok.Data;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;/*** @auther:Wangxl* @Emile:18335844494@163.com* @Time:2020/11/2 14:15*/
@Data
@Document(indexName = "filedata")
public class FileData {@Field(type = FieldType.Keyword)private String filePk;@Field(type = FieldType.Keyword)private String fileName;@Field(type = FieldType.Keyword)private Integer page;@Field(type = FieldType.Keyword)private String departmentId;@Field(type = FieldType.Keyword)private String ljdm;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String data;@Field(type = FieldType.Keyword)private String realName;@Field(type = FieldType.Keyword)private String url;@Field(type = FieldType.Keyword)private String type;
}

接口类

package top.fate.controller;import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.IndexOperations;
import org.springframework.data.elasticsearch.core.document.Document;
import org.springframework.data.elasticsearch.core.mapping.IndexCoordinates;
import org.springframework.util.Base64Utils;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import top.fate.entity.FileData;import java.io.File;
import java.io.FileInputStream;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;/*** @auther:Wangxl* @Emile:18335844494@163.com* @Time:2022/6/1 16:33*/
@RestController
@RequestMapping(value = "fullTextSearch")
public class FullTextSearchController {@Autowiredprivate ElasticsearchRestTemplate elasticsearchRestTemplate;@Autowiredprivate RestHighLevelClient restHighLevelClient;@GetMapping("createIndex")public void add() {IndexOperations indexOperations = elasticsearchRestTemplate.indexOps(IndexCoordinates.of("testindex"));indexOperations.create();Document mapping = indexOperations.createMapping(FileData.class);indexOperations.putMapping(mapping);}@GetMapping("deleteIndex")public void deleteIndex() {IndexOperations indexOperations = elasticsearchRestTemplate.indexOps(FileData.class);indexOperations.delete();}@GetMapping("uploadFileToEs")public void uploadFileToEs() {try {//            File file = new File("D:\\desktop\\Java开发工程师-4年-王晓龙-2022-05.pdf");File file = new File("D:\\desktop\\Java开发工程师-4年-王晓龙-2022-05.docx");FileInputStream inputFile = new FileInputStream(file);byte[] buffer = new byte[(int)file.length()];inputFile.read(buffer);inputFile.close();//将文件转成base64编码String fileString = Base64Utils.encodeToString(buffer);FileData fileData = new FileData();fileData.setFileName(file.getName());fileData.setFilePk(file.getName());fileData.setData(fileString);IndexRequest indexRequest = new IndexRequest("testindex").id(fileData.getFilePk());indexRequest.source(JSON.toJSONString(fileData),XContentType.JSON);indexRequest.setPipeline("attachment");IndexResponse index = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);return;} catch (Exception e) {e.printStackTrace();}}@GetMapping("search")public Object search(@RequestParam("txt") String txt) {List list = new ArrayList();try {SearchRequest searchRequest = new SearchRequest("testindex");SearchSourceBuilder builder = new SearchSourceBuilder();builder.query(QueryBuilders.matchQuery("attachment.content",txt).analyzer("ik_max_word"));searchRequest.source(builder);// 返回实际命中数builder.trackTotalHits(true);//高亮HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("attachment.content");highlightBuilder.requireFieldMatch(false);//多个高亮关闭highlightBuilder.preTags("<span style='color:red'>");highlightBuilder.postTags("</span>");builder.highlighter(highlightBuilder);SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);if (search.getHits() != null) {for (SearchHit documentFields : search.getHits().getHits()) {Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();HighlightField title = highlightFields.get("attachment.content");Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();if (title != null) {Text[] fragments = title.fragments();String n_title = "";for (Text fragment : fragments) {n_title += fragment;}sourceAsMap.put("data", n_title);}list.add(dealObject(sourceAsMap,  FileData.class));}}} catch (Exception e) {e.printStackTrace();}return list;}/*public static void ignoreSource(Map<String, Object> map) {for (String key : IGNORE_KEY) {map.remove(key);}}*/public static <T> T dealObject(Map<String, Object> sourceAsMap, Class<T> clazz) {try {//            ignoreSource(sourceAsMap);Iterator<String> keyIterator = sourceAsMap.keySet().iterator();T t = clazz.newInstance();while (keyIterator.hasNext()) {String key = keyIterator.next();String replaceKey = key.replaceFirst(key.substring(0, 1), key.substring(0, 1).toUpperCase());Method method = null;try {method = clazz.getMethod("set" + replaceKey, sourceAsMap.get(key).getClass());} catch (NoSuchMethodException e) {continue;}method.invoke(t, sourceAsMap.get(key));}return t;} catch (Exception e) {e.printStackTrace();}return null;}
}

测试

创建索引

 localhost:9090/fullTextSearch/createIndex

上传文档

localhost:9090/fullTextSearch/uploadFileToEs

搜索

localhost:9090/fullTextSearch/search?txt=索引库

SpringBoot2.3.x整合ElasticSearch7.6.2 实现PDF,WORD全文检索相关推荐

  1. 记一次springboot2.3.*项目整合elasticsearch7.6.2实现中文拼音分词搜索

    一.elasticsearch官网下载:Elasticsearch 7.6.2 | Elastic 二.拼音.ik.繁简体转换插件安装 ik分词:GitHub - medcl/elasticsearc ...

  2. springboot整合xwpf将world转为pdf

    springboot整合xwpf将world转为pdf 该案例实现: 1.读取world模版 2.动态填充world模版数据 3.将填充好的world转换为pdf 目录结构 引入pom依赖 <d ...

  3. 项目中整合ireport用来导出pdf文件

    项目中整合ireport用来导出pdf文件: 1.安装ireport 2.修改ireport安装路径下jdk设置: 文件地址:etc/ireport.conf其中找到jdkhome,改为服务器jdk安 ...

  4. SpringCloudAlibaba篇(八)SpringCloudGateWay聚合swagger3、SpringBoot2.6.X整合swagger3+knife4j

    上一篇,SpringCloudAlibaba篇(七)SpringCloud整合Zipkin分布式链路跟踪系统(SpringCloud+dubbo+Zipkin) 文章目录 前言 服务端 构建依赖 配置 ...

  5. SpringBoot2.x教程--整合使用jOOQ面向对象查询

    一. jOOQ简介 1.jOOQ概述 jOOQ(Java Object Oriented Querying): 翻译成中文是 Java面向对象查询 的意思. jOOQ是Data Geekery提供的基 ...

  6. 基于 SpringBoot2.0+优雅整合 SpringBoot+Mybatis

    SpringBoot 整合 Mybatis 有两种常用的方式,一种就是我们常见的 xml 的方式 ,还有一种是全注解的方式.我觉得这两者没有谁比谁好,在 SQL 语句不太长的情况下,我觉得全注解的方式 ...

  7. SpringBoot(2.2.4.RELEASE)整合ElasticSearch7.15.2

    1.Docker安装ElasticSearch7.17.1 A.下载Docker镜像docker pull elasticsearch:7.17.1 #存储和检索数据docker pull kiban ...

  8. SpringBoot2.1.4整合log4j2保存日志到MySQL中

    一.框架版本 springboot2.1.4,log4j2(2.11.2),MySQL5.7 maven依赖请自行上网百度,本文不再赘述,网上大多是基于log4j2-spring.xml方式配置jdb ...

  9. springboot2.5.0 整合 redis 配置详解

    1. pom添加依赖 <!--redis--><dependency><groupId>org.springframework.boot</groupId&g ...

最新文章

  1. SAP RETAIL MM41维护商品主数据的时候可以维护分类数据
  2. 树莓派小车(远程控制、PWM变速、超声波自动避障)
  3. 前端学习(3057):vue+element今日头条管理-回顾
  4. 用Kubernetes搭建Etcd集群和WebUI
  5. 06-07 Jenkins中配置 Git 认证信息
  6. python自动登录qq空间_python 利用splinter组件,自动登录QQ空间
  7. Python 内置函数介绍
  8. 中南大学-大学生心理健康教育-MOOC/雨课堂-图片版答案(期末测试)
  9. excel科学计数法还原成字符串方法
  10. vue填坑之全局引入less,scss,styl文件
  11. python 全部缩进一行_Python(青铜时代)——基本概念
  12. 谷歌浏览器fash弹框的设置
  13. (新型事件相机有关的论文解读)A Unifying Contrast Maximization Framework for Event Cameras
  14. 原子化服务的官方解析来啦~
  15. Python爬虫框架Scrapy入门(三)爬虫实战:爬取长沙链家二手房
  16. 查看BMP格式图片的十六进制代码
  17. 1292. 元素和小于等于阈值的正方形的最大边长-前缀和算法
  18. windows下创建python虚拟环境
  19. TiDB 可观测性方案落地探索 | “我们这么菜评委不会生气吧”团队访谈
  20. pygame实现井字棋——3.逻辑优化

热门文章

  1. 攻防世界forgot——让人眼花目眩的一道题(详细菜鸡向)
  2. 基于Java和Socket实现局域网通讯的简易微信设计
  3. wap开发工具网址集锦
  4. 2019年全国统一高考数学试卷理科新课标Ⅱ[图片版]
  5. Assertion failed: (KALDI_ISFINITE(sigma) “Tridiagonalizing matrix that is too large or has NaNs.“
  6. 安卓装逼技巧:QQ空间变iPhone8客户端
  7. 还在为PPT的icon图发愁吗?PowerMockup助你一臂之力!!!
  8. TCP如何保证可靠性,TCP如何实现可靠性传输的
  9. Java源码分析集合部分总结
  10. 联想ThinkPad E440 win8.1系统改装为win7