lucene和solr第二篇

回顾

第一篇说了：
1）用lucene创建索引库；

2）分词器的作用、比较和选择；

一、索引库的维护

1、添加文档
2、删除文档
3、修改文档

二、lucene的查询

1、使用Query的子类查询
1）MatchAllDocsQuery
2)TermQuery
3)NumericRangeQuery
4)BooleanQuery

2、使用QueryParser
1)QueryParser
2)MultiFieldQueryParser

package cn.itcast.lucenen.further.test;import org.apache.commons.io.FileUtils;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.search.*;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;
import org.wltea.analyzer.lucene.IKAnalyzer;import java.io.File;
import java.io.IOException;public class LuceneFurtherTest {/*field域分析（分词器，换句话说如果不分词，那就是整段整段的来）、建立索引（对词条创建索引；只有对词条创建了索引，才能查询到这个词条；包括 建立词条和文档的连接'也就是说可以通过查询词条，获取到文档的id，然后根据文档id找到这个文档'）、存储（就是Stored(yes/no)，是否把词条对应的整个文档存到索引库中）、Field类             数据类型             Analyzer（yes/no）            Indexed(yes/no)             Stored(yes/no)StringFiled         Sting                   no                          yes                         yes/noLongField           Long                    yes                         yes                         yes/noTestField           字符串或流                yes                         yes                           yes/no*/@Testpublic void AddDocument() throws IOException {//1、索引库存放路径FSDirectory directory = FSDirectory.open(new File("/Users/benjamin/idea-workspace/lucene/indexX"));//2、创建indexWriter对象IndexWriterConfig conf = new IndexWriterConfig(Version.LATEST, new IKAnalyzer());IndexWriter indexWriter = new IndexWriter(directory, conf);//3、创建document对象Document document = new Document();//4、向document对象中添加域'不同的document可以有不同的域，同一个document可以有相同的域'//4。1、获取源文档File srcFile = new File("/Users/benjamin/idea-workspace/lucene/上课用的查询资料searchsource");File[] listFiles = srcFile.listFiles();for (File file : listFiles) {//4.1 文件名称String fileName = file.getName();//定义域对象TextField nameField = new TextField("name", fileName, Field.Store.YES);// 添加域对象document.add(nameField);//4。2 文件大小long fileSize = FileUtils.sizeOf(file);LongField sizeField = new LongField("size", fileSize, Field.Store.YES);document.add(sizeField);//4。3 文件路径String filePath = file.getPath();TextField pathField = new TextField("path", filePath, Field.Store.YES);document.add(pathField);//4。4 文件内容String fileContent = FileUtils.readFileToString(file);TextField contentField = new TextField("content", fileContent, Field.Store.YES);document.add(contentField);//5. 把文档写入索引库indexWriter.addDocument(document);}//6、关闭资源indexWriter.close();}/*索引库全部删除：慎用*/@Testpublic void deleteDocument() throws IOException {//1、索引库存放路径FSDirectory directory = FSDirectory.open(new File("/Users/benjamin/idea-workspace/lucene/indexX"));//2、创建indexWriter对象IndexWriterConfig conf = new IndexWriterConfig(Version.LATEST, new IKAnalyzer());IndexWriter indexWriter = new IndexWriter(directory, conf);//删除全部索引indexWriter.deleteAll();//6、关闭资源indexWriter.close();}/*根据query删除A Term represents a word from text.  This is the unit of search.  It iscomposed of two elements, the text of the word, as a string, and the name ofthe field that the text occurred in.term是一个查询单元；*/@Testpublic void deleteDocumentByQuery() throws IOException {//1、索引库存放路径FSDirectory directory = FSDirectory.open(new File("/Users/benjamin/idea-workspace/lucene/indexX"));//2、创建indexWriter对象IndexWriterConfig conf = new IndexWriterConfig(Version.LATEST, new IKAnalyzer());IndexWriter indexWriter = new IndexWriter(directory, conf);//创建一个查询条件Query query = new TermQuery(new Term("filename", "apache"));
//        Query query2 = new TermQuery(new Term("filename", "apache"));//删除全部索引indexWriter.deleteDocuments(query);
//        indexWriter.deleteDocuments(query,query2);//6、关闭资源indexWriter.close();}/*索引库的修改本质是：先删除、后添加*/@Testpublic void updateIndex() throws IOException {//1、索引库存放路径FSDirectory directory = FSDirectory.open(new File("/Users/benjamin/idea-workspace/lucene/indexX"));//2、创建indexWriter对象IndexWriterConfig conf = new IndexWriterConfig(Version.LATEST, new IKAnalyzer());IndexWriter indexWriter = new IndexWriter(directory, conf);//创建一个document对象Document document = new Document();//向document对象中添加域'不同的document可以有不同的域、同一个document可以有相同的域'TextField textField = new TextField("name", "新文档新indexWriter新文档新文档新文updateDocument新文档新文档新文档新文档apache", Field.Store.YES);textField.setBoost(10);document.add(textField);TextField textField2 = new TextField("content", "新文档内容", Field.Store.YES);document.add(textField2);//更新indexWriter.updateDocument(new Term("content", "java"), document);//关闭资源indexWriter.close();}/*二、lucene的查询
1、使用Query的子类查询
1）MatchAllDocsQueryluke中： *:*2)TermQuery
3)NumericRangeQuery场景；查询文件大小在0-100kb之间的；4)BooleanQuery场景： 多个条件组合查询；2、使用QueryParser场景： 用户输入一段话；需要先分词， 然后再根据分词查询；1)QueryParsername:lucene name:project name:apache2)MultiFieldQueryParser(name:lucene content:lucene) (name:project content:project) (name:apache content:apache)*/@Test@SuppressWarnings("All")public void searchIndex() throws IOException, ParseException {//1、指定索引库的位置FSDirectory directory = FSDirectory.open(new File("/Users/benjamin/idea-workspace/lucene/indexX"));//2、创建读取索引的对象IndexReader reader = DirectoryReader.open(directory);//3、创建查询索引对象IndexSearcher searcher = new IndexSearcher(reader);//4、执行查询方法 query：指定条件； n:查询数据量的限制；//        TopDocs topDocs = searcher.search(new TermQuery(new Term("content", "spring")), 100);
//        TopDocs topDocs = searcher.search(new TermQuery(new Term("content", "apache")), 100);//3。1 根据文件大小查询NumericRangeQuery<Long> size = NumericRangeQuery.newLongRange("size", 0L, 100L, true, true);
//        TopDocs topDocs = searcher.search(size, 100);//3.2 多个条件组合查询/* BooleanQuery booleanClauses = new BooleanQuery();TermQuery query1 = new TermQuery(new Term("name", "apache"));TermQuery query2 = new TermQuery(new Term("content", "spring"));*//* BooleanClause.Occur.MUST 必须的BooleanClause.Occur.SHOULD 可以带也可以不带BooleanClause.Occur.MUST_NOT 绝对不可以的，不能包含的；booleanClauses.add(query1, BooleanClause.Occur.MUST);booleanClauses.add(query2, BooleanClause.Occur.MUST);luke中：+name:apache +content:springbooleanClauses.add(query1, BooleanClause.Occur.MUST);booleanClauses.add(query2, BooleanClause.Occur.SHOULD);luke中：+name:apache cotent:springbooleanClauses.add(query1, BooleanClause.Occur.SHOULD);booleanClauses.add(query2, BooleanClause.Occur.MUST_NOT);luke中：name:apache -content:springbooleanClauses.add(query1, BooleanClause.Occur.MUST_NOT);booleanClauses.add(query2, BooleanClause.Occur.MUST_NOT);没有一个的查询方式；*//*booleanClauses.add(query1, BooleanClause.Occur.MUST);booleanClauses.add(query2, BooleanClause.Occur.SHOULD);TopDocs topDocs = searcher.search(booleanClauses, 100);*///3.3 MatchAllDocsQuery luke中： *:*/*  MatchAllDocsQuery matchAllDocsQuery = new MatchAllDocsQuery();TopDocs topDocs = searcher.search(matchAllDocsQuery, 100);*//*3.4使用QueryParser场景： 用户输入一段话；需要先分词， 然后再根据分词查询；单个域查询；用户输入语句 "lucene is a project of apache"luke:name:lucene or name:project or name:apache*//* QueryParser queryParser = new QueryParser("name", new IKAnalyzer());Query query = queryParser.parse("lucene is a project of apache");TopDocs topDocs = searcher.search(query, 100);*//*3。5MultiFieldQueryParser场景： 多个域联合查询；*/String[] fields = {"name","content"};MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields, new IKAnalyzer());Query query = queryParser.parse("lucene is a project of apache");TopDocs topDocs = searcher.search(query, 100);// 5、获取查询结果System.out.println("总记录数：  " + topDocs.totalHits);//5.1 打印luke语法System.out.println(query.toString());ScoreDoc[] scoreDocs = topDocs.scoreDocs;for (ScoreDoc scoreDoc : scoreDocs) {int docId = scoreDoc.doc;Document document = searcher.doc(docId);//获取文档的内容System.out.println("文件名：" + document.get("name"));System.out.println("文件大小：" + document.get("size"));System.out.println("文件路径：" + document.get("path"));
//            System.out.println("文件内容：" + document.get("content"));}//6、关闭资源reader.close();}}

三、相关度排序

四、什么是solr

lucene是jar包，基础库，solr是基于lucene开发的一个web工程；

lucene和solr是同步更新的，版本要对应一一致了；

五、Solr的安装和配置

按照官网教程来；

solrHome就是 server/solr

六、Solr后台的使用

1）创建 solr cores 或 collections

The create command detects the mode that Solr is running in (standalone or SolrCloud) and then creates a core or collection depending on the mode.


promote:bin benjamin$ ./solr create -c mycoreInstance:/Users/benjamin/idea-workspace/solr/solr-7.5.0/example/techproducts/solr/mycoreData:/Users/benjamin/idea-workspace/solr/solr-7.5.0/example/techproducts/solr/mycore/dataIndex:/Users/benjamin/idea-workspace/solr/solr-7.5.0/example/techproducts/solr/mycore/data/index

七、使用solrj维护索引库

lucene和solr第二篇相关推荐

53.大数据之旅——java分布式项目14-信息检索技术之Lucene，Solr
信息检索技术概念介绍全文检索是一种将文件中所有文本与检索项匹配的文字资料检索方法.全文检索系统是按照全文检索理论建立起来的用于提供全文检索服务的软件系统. 全文检索主要对非结构化数据的数据检索. ...
什么是Lucene和Solr和Elasticsearch，它们的区别是什么？
说道es我们往往会听到Solr和Lucene,那么Lucene和Solr和Elasticsearch的区别? Lucene Lucene是apache下的一个子项目,是一个开放源代码的全文检索引擎工具 ...
面试题：Lucene、Solr、ElasticSearch
1.Lucene和Solr和Elasticsearch的区别 Lucene Lucene是apache下的一个子项目,是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文 ...
solr和lucene_使用Apache Lucene和Solr 4进行下一代搜索和分析
solr和lucene 六年前,我开始为developerWorks撰写有关Solr和Lucene的文章(请参阅参考资料 ). 多年来,Lucene和Solr确立了自己的坚如磐石的技术(Lucene作 ...
ElasticSearch入门第二篇：集群配置
这是ElasticSearch 2.4 版本系列的第二篇: ElasticSearch入门第一篇:Windows下安装ElasticSearch ElasticSearch入门第二篇:集群配置 E ...
Django框架之第二篇
Django框架之第二篇一.知识点回顾 1.MTV模型 model:模型,和数据库相关的 template:模板,存放html文件,模板语法(目的是将变量如何巧妙的嵌入到HTML页面中). view ...
java设计模式中不属于创建型模式_23种设计模式第二篇：java工厂模式定义：工厂模式是 Java 中最常用的设计模式之一。这种类型的设计模式属于创建型模式，它提供了一种创建对象的最佳方式...
23种设计模式第二篇:java工厂模式定义: 工厂模式是 Java 中最常用的设计模式之一.这种类型的设计模式属于创建型模式,它提供了一种创建对象的最佳方式. 工厂模式主要是为创建对象提供过渡接口, ...
深入理解javascript函数系列第二篇——函数参数
前面的话 javascript函数的参数与大多数其他语言的函数的参数有所不同.函数不介意传递进来多少个参数,也不在乎传进来的参数是什么数据类型,甚至可以不传参数.本文是深入理解javascript函数 ...
Spotify敏捷模式详解三部曲第二篇：研发过程
本文转自:Scrum 中文网引言在本系列文章的第一篇,我们介绍了Spotify的敏捷研发团队,以及它独特的组织架构.Spotify的研发团队采用的是一种非常独特的组织架构,如下图所示: 整个研发组 ...