一. 认识搜索引擎

搜索设计的核心思路

二. 项目介绍

三. 实现索引模块

构建索引

环境搭建

实现 Indexer类

实现 FileScanner类

实现 IndexManager类

实现 IndexerProperties类

实现 Document类

数据库中创建表

实现 InvertedRecord类

四. 实现搜索模块

实现前端页面

实现 Document

实现 DescBuilder类

五. 项目展示

页面展示

输入展示

搜索展示

视频演示

一. 认识搜索引擎

定义：

搜索引擎是指根据一定的策略、运用特定的计算机程序从互联网上采集信息，在对信息进行组织和处理后，为用户提供检索服务，将检索的相关信息展示给用户的系统。

如图，在百度搜索中输入“康熙”，就会显示以下页面：

搜索设计的核心思路

以用户的角度来看：用户输入搜索词（一个词或者多个词），在已有的文档中，找到文档包含这些词的所有文档信息，给出相应的列表。假设文档个数为m，文档平均长度（标题+内容）为n，则复杂度为O（m*n）；现实中，m非常的大，所以从性能上将这种方式否决。

标准的解法就是使用倒排索引（Inverted index）。

文档：被检索的 html 页面，pdf，图片，视频等等。

倒排索引：一个词被那些文档所引用，描述一个词的基本信息，存储了这个词都存在在那些文档中，这个词在文档中的重要程度。

正排索引：一个文档中包含了那些词，描述一个文档的基本信息，将文档中的词进行分词处理并存储。

二. 项目介绍

该项目只针对 JDK API 文档库中的 html 做搜索。下载地址：文档下载

主要实现以下两个模块：

构建索引模块（不需要使用web功能）
搜索模块（依赖构建索引完成之后才能进行，需要web功能）

三. 实现索引模块

构建索引

forward.json：存储正排索引，没有考虑性能，使用了方便理解的 JSON 格式；

inverted.json：存储倒排索引；

构建索引的步骤：

1. 扫描文档目录下的所有文档：目录遍历的过程 FileScanner；

2. 针对每一篇文档进行分析，处理，得到文档的标题，最终访问的URL，文档下的内容；

3. 每一篇文档：标题，url，内容，标题和内容的每个词；利用上述信息就可以构建索引；

4. 保存索引信息（可以保存成文件系统的一个文件，或者表中的记录）

环境搭建

创建Spring Boot项目

构建搜索引擎，只需要Lombok即可；

点击 Finish 就完成 Spring Boot 的项目创建了；

实现 Indexer类

在indexer下创建command.Indexer，构建索引的模块，是整个程序的逻辑入口；

/*** 构建索引的模块，是整个程序的逻辑入口*/
@Slf4j      // 添加 Spring 日志的使用
@Component  // 注册成 Spring 的 bean
//@Profile("run") // 让跑测试的时候不加载这个 bean（run != test)
public class Indexer implements CommandLineRunner {// 需要依赖 FileScanner 对象private final FileScanner fileScanner;private final IndexerProperties properties;private final IndexManager indexManager;private final ExecutorService executorService;@Autowired  // 构造方法注入的方式，让 Spring 容器，注入 FileScanner 对象进来 —— DIpublic Indexer(FileScanner fileScanner, IndexerProperties properties, IndexManager indexManager, ExecutorService executorService) {this.fileScanner = fileScanner;this.properties = properties;this.indexManager = indexManager;this.executorService = executorService;}@Overridepublic void run(String... args) throws Exception {ToAnalysis.parse("随便分个什么，进行预热，避免优化的时候计算第一次特别慢的时间");log.info("这里的整个程序的逻辑入口");// 1. 扫描出来所有的 html 文件log.debug("开始扫描目录，找出所有的 html 文件。{}", properties.getDocRootPath());List<File> htmlFileList = fileScanner.scanFile(properties.getDocRootPath(), file -> {return file.isFile() && file.getName().endsWith(".html");});log.debug("扫描目录结束，一共得到 {} 个文件。", htmlFileList.size());// 2. 针对每个 html 文件，得到其 标题、URL、正文信息，把这些信息封装成一个对象（文档 Document）File rootFile = new File(properties.getDocRootPath());List<Document> documentList = htmlFileList.stream().parallel()         // 【注意】由于我们使用了 Stream 用法，所以，可以通过添加 .parallel()，使得整个操作变成并行，利用多核增加运行速度.map(file -> new Document(file, properties.getUrlPrefix(), rootFile)).collect(Collectors.toList());log.debug("构建文档完毕，一共 {} 篇文档", documentList.size());// 3. 进行正排索引的保存indexManager.saveForwardIndexesConcurrent(documentList);log.debug("正排索引保存成功。");// 4. 进行倒排索引的生成核保存indexManager.saveInvertedIndexesConcurrent(documentList);log.debug("倒排索引保存成功。");// 5. 关闭线程池executorService.shutdown();}
}

实现 FileScanner类

扫描文件，找到符合条件的文件；

@Slf4j      // 添加日志
@Service    // 注册成 Spring bean
public class FileScanner {/*** 以 rootPath 作为根目录，开始进行文件的扫描，把所有符合条件的 File 对象，作为结果，以 List 形式返回* @param rootPath 根目录的路径，调用者需要确保这个目录存在 && 一定是一个目录* @param filter 通过针对每个文件调用 filter.accept(file) 就知道，文件是否满足条件* @return 满足条件的所有文件*/public List<File> scanFile(String rootPath, FileFilter filter) {List<File> resultList = new ArrayList<>();File rootFile = new File(rootPath);// 针对目录树进行遍历，深度优先 or 广度优先即可，确保每个文件都没遍历到即可// 我们这里采用深度优先遍历，使用递归完成traversal(rootFile, filter, resultList);return resultList;}private void traversal(File directoryFile, FileFilter filter, List<File> resultList) {// 1. 先通过目录，得到该目录下的孩子文件有哪些File[] files = directoryFile.listFiles();if (files == null) {// 说明有问题，我们不管（一般是权限等的问题），通常咱们遇不到这个错误return;}// 2. 遍历每个文件，检查是否符合条件for (File file : files) {// 通过 filter.accept(file) 的返回值，判断是否符合条件if (filter.accept(file)) {// 说明符合条件，需要把该文件加入到结果 List 中resultList.add(file);}}// 3. 遍历每个文件，针对是目录的情况，继续深度优先遍历（递归）for (File file : files) {if (file.isDirectory()) {traversal(file, filter, resultList);}}}
}

实现 IndexManager类

管理索引

@Slf4j
@Component
public class IndexManager {private final IndexDatabaseMapper mapper;private final ExecutorService executorService;@Autowiredpublic IndexManager(IndexDatabaseMapper mapper, ExecutorService executorService) {this.mapper = mapper;this.executorService = executorService;}// 先批量生成、保存正排索引（单线程版本）public void saveForwardIndexes(List<Document> documentList) {// 1. 批量插入时，每次插入多少条记录（由于每条记录比较大，所以这里使用 10 条就够了）int batchSize = 10;// 2. 一共需要执行多少次 SQL？   向上取整(documentList.size() / batchSize)int listSize = documentList.size();int times = (int) Math.ceil(1.0 * listSize / batchSize);    // ceil(天花板): 向上取整log.debug("一共需要 {} 批任务。", times);// 3. 开始分批次插入for (int i = 0; i < listSize; i += batchSize) {// 从 documentList 中截取这批要插入的 文档列表（使用 List.subList(int from, int to)int from = i;int to = Integer.min(from + batchSize, listSize);List<Document> subList = documentList.subList(from, to);// 针对这个 subList 做批量插入mapper.batchInsertForwardIndexes(subList);}}@Timing("构建 + 保存正排索引 —— 多线程版本")@SneakyThrowspublic void saveForwardIndexesConcurrent(List<Document> documentList) {// 1. 批量插入时，每次插入多少条记录（由于每条记录比较大，所以这里使用 10 条就够了）int batchSize = 10;// 2. 一共需要执行多少次 SQL？   向上取整(documentList.size() / batchSize)int listSize = documentList.size();int times = (int) Math.ceil(1.0 * listSize / batchSize);    // ceil(天花板): 向上取整log.debug("一共需要 {} 批任务。", times);CountDownLatch latch = new CountDownLatch(times);   // 统计每个线程的完全情况，初始值是 times(一共多少批)// 3. 开始分批次插入for (int i = 0; i < listSize; i += batchSize) {// 从 documentList 中截取这批要插入的 文档列表（使用 List.subList(int from, int to)int from = i;int to = Integer.min(from + batchSize, listSize);Runnable task = () -> { // 内部类 / lambda 表达式里如果用到了外部变量，外部变量必须的 final（或者隐式 final 的变量）List<Document> subList = documentList.subList(from, to);// 针对这个 subList 做批量插入mapper.batchInsertForwardIndexes(subList);latch.countDown();      //  每次任务完成之后，countDown()，让 latch 的个数减一};executorService.submit(task);   // 主线程只负责把一批批的任务提交到线程池，具体的插入工作，由线程池中的线程完成}// 4. 循环结束，只意味着主线程把任务提交完成了，但任务有没有做完是不知道的// 主线程等在 latch 上，只到 latch 的个数变成 0，也就是所有任务都已经执行完了latch.await();}@SneakyThrowspublic void saveInvertedIndexes(List<Document> documentList) {int batchSize = 10000;  // 批量插入时，最多 10000 条List<InvertedRecord> recordList = new ArrayList<>();    // 放这批要插入的数据for (Document document : documentList) {Map<String, Integer> wordToWeight = document.segWordAndCalcWeight();for (Map.Entry<String, Integer> entry : wordToWeight.entrySet()) {String word = entry.getKey();int docId = document.getDocId();int weight = entry.getValue();InvertedRecord record = new InvertedRecord(word, docId, weight);recordList.add(record);// 如果 recordList.size() == batchSize，说明够一次插入了if (recordList.size() == batchSize) {mapper.batchInsertInvertedIndexes(recordList);  // 批量插入recordList.clear();                             // 清空 list，视为让 list.size() = 0}}}// recordList 还剩一些，之前放进来，但还不够 batchSize 个的，所以最后再批量插入一次mapper.batchInsertInvertedIndexes(recordList);  // 批量插入recordList.clear();}static class InvertedInsertTask implements Runnable {private final CountDownLatch latch;private final int batchSize;private final List<Document> documentList;private final IndexDatabaseMapper mapper;InvertedInsertTask(CountDownLatch latch, int batchSize, List<Document> documentList, IndexDatabaseMapper mapper) {this.latch = latch;this.batchSize = batchSize;this.documentList = documentList;this.mapper = mapper;}@Overridepublic void run() {List<InvertedRecord> recordList = new ArrayList<>();    // 放这批要插入的数据for (Document document : documentList) {Map<String, Integer> wordToWeight = document.segWordAndCalcWeight();for (Map.Entry<String, Integer> entry : wordToWeight.entrySet()) {String word = entry.getKey();int docId = document.getDocId();int weight = entry.getValue();InvertedRecord record = new InvertedRecord(word, docId, weight);recordList.add(record);// 如果 recordList.size() == batchSize，说明够一次插入了if (recordList.size() == batchSize) {mapper.batchInsertInvertedIndexes(recordList);  // 批量插入recordList.clear();                             // 清空 list，视为让 list.size() = 0}}}// recordList 还剩一些，之前放进来，但还不够 batchSize 个的，所以最后再批量插入一次mapper.batchInsertInvertedIndexes(recordList);  // 批量插入recordList.clear();latch.countDown();}}@Timing("构建 + 保存倒排索引 —— 多线程版本")@SneakyThrowspublic void saveInvertedIndexesConcurrent(List<Document> documentList) {int batchSize = 10000;  // 批量插入时，最多 10000 条int groupSize = 50;int listSize = documentList.size();int times = (int) Math.ceil(listSize * 1.0 / groupSize);CountDownLatch latch = new CountDownLatch(times);for (int i = 0; i < listSize; i += groupSize) {int from = i;int to = Integer.min(from + groupSize, listSize);List<Document> subList = documentList.subList(from, to);Runnable task = new InvertedInsertTask(latch, batchSize, subList, mapper);executorService.submit(task);}latch.await();}
}

实现 IndexerProperties类

@Component  // 是注册到 Spring 的一个 bean
@ConfigurationProperties("searcher.indexer")
@Data // = @Getter + @Setter + @ToString + @EqualsAndHashCode
public class IndexerProperties {// 对应 application.yml 配置下的 searcher.indexer.doc-root-pathprivate String docRootPath;// 对应 application.yml 配置下的 searcher.indexer.url-prefixprivate String urlPrefix;// 对应 application.yml 配置下的 searcher.indexer.index-root-pathprivate String indexRootPath;
}

实现 Document类

Document类中包含docID，title，url，content相关的信息;

@Slf4j
@Data
public class Document {private Integer docId;  // docId 会在正排索引插入后才会赋值private String title;   // 从文件名中解析出来private String url;     // 依赖两个额外的信息（1. https://docs.oracle.com/javase/8/docs/api/  2. 相对路径的相对位置）private String content; // 从文件中读取出来，并且做一定的处理
}

    // 针对文档进行分词，并且分别计算每个词的权重public Map<String, Integer> segWordAndCalcWeight() {// 统计标题中的每个词出现次数 | 分词：标题有哪些词List<String> wordInTitle = ToAnalysis.parse(title).getTerms().stream().parallel().map(Term::getName).filter(s -> !ignoredWordSet.contains(s)).collect(Collectors.toList());// 统计标题中，每个词的出现次数 | 统计次数Map<String, Integer> titleWordCount = new HashMap<>();for (String word : wordInTitle) {int count = titleWordCount.getOrDefault(word, 0);titleWordCount.put(word, count + 1);}// 统计内容中的词，以及词的出现次数List<String> wordInContent = ToAnalysis.parse(content).getTerms().stream().parallel().map(Term::getName).collect(Collectors.toList());Map<String, Integer> contentWordCount = new HashMap<>();for (String word : wordInContent) {int count = contentWordCount.getOrDefault(word, 0);contentWordCount.put(word, count + 1);}// 计算权重值Map<String, Integer> wordToWeight = new HashMap<>();// 先计算出有哪些词，不重复Set<String> wordSet = new HashSet<>(wordInTitle);wordSet.addAll(wordInContent);for (String word : wordSet) {int titleCount = titleWordCount.getOrDefault(word, 0);int contentCount = contentWordCount.getOrDefault(word, 0);int weight = titleCount * 10 + contentCount;wordToWeight.put(word, weight);}return wordToWeight;}

数据库中创建表

创建 forward_indexes 和 inverted_indexes 表

CREATE SCHEMA `searcher_refactor` DEFAULT CHARACTER SET utf8mb4 ;CREATE TABLE `searcher_refactor`.`forward_indexes` (`docid` INT NOT NULL AUTO_INCREMENT,`title` VARCHAR(100) NOT NULL,`url` VARCHAR(200) NOT NULL,`content` LONGTEXT NOT NULL,PRIMARY KEY (`docid`))
COMMENT = '存放正排索引\ndocid -> 文档的完整信息';CREATE TABLE `searcher_refactor`.`inverted_indexes` (`id` INT NOT NULL AUTO_INCREMENT,`word` VARCHAR(100) NOT NULL,`docid` INT NOT NULL,`weight` INT NOT NULL,PRIMARY KEY (`id`))
COMMENT = '倒排索引\n通过 word -> [ { docid + weight }, { docid + weight }, ... ]';

实现 InvertedRecord类

// 这个对象映射 inverted_indexes 表中的一条记录（我们不关心表中的 id，就不写 id 了）
@Data
public class InvertedRecord {private String word;private int docId;private int weight;public InvertedRecord(String word, int docId, int weight) {this.word = word;this.docId = docId;this.weight = weight;}
}

四. 实现搜索模块

实现前端页面

index.html

<!DOCTYPE html>
<html lang="zh-hans">
<head><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>神马搜索</title><link rel="stylesheet" href="style.css">
</head>
<body><div class="container"><i class="fa-brands fa-windows item"></i><div class="search-box"><input type="text" class="search-btn" placeholder="搜索"></div><i class="fa-solid fa-magnifying-glass item search-submit"></i></div><div class="time-box"></div><div class="poem"><p>「世间行乐亦如此，古来万事东流水。」</p><p class="author">—— 《梦游天姥吟留别》</p></div><div class="background"></div><script src="https://kit.fontawesome.com/44e73cd2d1.js" crossorigin="anonymous"></script><script>const search = (query) => {window.open('/web?query=' + encodeURIComponent(query), '_blank')}const oSearch = document.querySelector('.search-btn')oSearch.addEventListener('focus', () => {oSearch.placeholder = ''})oSearch.addEventListener('blur', () => {oSearch.placeholder = '搜索'})oSearch.addEventListener('keydown', (event) => {if (event.keyCode === 13 && oSearch.value.trim().length !== 0) {search(oSearch.value.trim())oSearch.value = ''oSearch.blur()}})document.querySelector('.search-submit').addEventListener('click', () => {if (oSearch.value.trim().length !== 0) {search(oSearch.value.trim())oSearch.value = ''}})const oTimeBox = document.querySelector('.time-box')const updateTime = () => {let now = new Date()let hour = now.getHours()let minute = now.getMinutes()if (hour < 10) {hour = '0' + hour}if (minute < 10) {minute = '0' + minute}oTimeBox.textContent = `${hour}:${minute}`let second = now.getSeconds()let r = 60 - secondsetTimeout(updateTime, r * 1000)}updateTime()</script>
</body>
</html>

search.html

<!DOCTYPE html>
<html lang="zh-hans" xmlns:th="https://www.thymeleaf.org">
<head><meta charset="UTF-8"><title th:text="${query} + ' - 神马搜索'"></title><link rel="stylesheet" href="/query.css">
</head>
<body><!-- th:xxx 是 thymeleaf 的语法 -->
<!--    <div th:text="'你好 ' + ${name} + ' 世界'"></div>--><div class="header"><div class="brand"><a href="/">神马搜索</a></div><form class="input-shell" method="get" action="/web"><input type="text" name="query" th:value="${query}"><button>神马搜索</button></form></div><div class="result"><!-- th:utext 和 th:text 的区别：要不要进行 HTML 转义 -->
<!--        <div th:text="'<span>你好 th:text</span>'"></div>-->
<!--        <div th:utext="'<span>你好 th:utext</span>'"></div>--><div class="result-item" th:each="doc : ${docList}"><a th:href="${doc.url}" th:text="${doc.title}"></a><div class="desc" th:utext="${doc.desc}"></div><div class="url" th:text="${doc.url}"></div></div></div><!--    <div class="result">-->
<!--        <div th:each="item : ${testList}">-->
<!--            <span th:text="${item}"></span>-->
<!--        </div>-->
<!--    </div>--><!-- 一直上一页可能走到 page <= 0 的情况 --><!-- 一直下一页可能走到 page > 上限的情况 --><div class="pagination"><a th:href="'/web?query=' + ${query} + '&page=' + ${page - 1}">上一页</a><a th:href="'/web?query=' + ${query} + '&page=' + ${page + 1}">下一页</a></div>
</body>
</html>

query.css

* {margin: 0;padding: 0;box-sizing: border-box;
}.header {width: 100%;height: 80px;position: fixed;    /* 固定不动 */left: 0;top: 0;background-color: #eee;border-bottom: 1px solid #ccc;padding-left: 120px;display: flex;align-items: center;
}.brand {margin-right: 120px;
}.brand a {color: inherit;text-decoration: none;
}.input-shell {width: 800px;height: 52px;border: 1px solid #aaa;border-radius: 4px;display: flex;align-items: stretch;justify-content: space-between;
}.input-shell:focus,     /* :focus : 该元素 获得焦点 */
.input-shell:hover {    /* :hover : 鼠标滑过该元素 */border: 1px solid #888;
}.input-shell input {border: none;outline: none;width: 600px;padding-left: 8px;font-size: 22px;
}.input-shell button {border: none;outline: none;width: 200px;border-left: 1px solid #ccc;
}.result {margin-top: 88px;width: 100%;padding-left: 120px;
}.result-item {display: flex;flex-direction: column;margin-bottom: 20px;align-items: start;
}.result-item a {font-size: 22px;font-weight: 700;color: rgb(42, 107, 205);
}.result-item .desc {font-size: 18px;
}.result-item .url {font-size: 18px;color: rgb(0, 128, 0);
}.result-item .desc i {color: red;font-style: normal;
}.pagination {display: flex;align-items: center;justify-content: space-around;margin-bottom: 12px;
}

style.css

* {margin: 0;padding: 0;box-sizing: border-box;
}body {width: 100vw;height: 100vh;display: flex;align-items: center;justify-content: center;position: relative;overflow: hidden;
}.container {z-index: 1;height: 60px;background-color: rgba(255, 255, 255, .7);padding: 0 8px;border-radius: 30px;backdrop-filter: blur(4px);box-shadow: 0 0 5px 1px gray;display: flex;align-items: center;justify-content: space-around;
}.time-box {z-index: 1;position: absolute;background-color: transparent;height: 40px;top: 40%;line-height: 40px;font-size: 40px;text-align: center;color: #fff;text-shadow: 0 0 4px #000;
}.search-box {width: 200px;transition: all .3s ease-in-out;
}.container:hover .search-box,
.container:focus-within .search-box {width: 440px;
}.container .item {margin: auto 20px;font-size: 20px;opacity: 0;transition-delay: .3s;transition: all .3s ease;
}.container:focus-within .item {opacity: 1;
}.container .search-submit {display: inline-block;height: 40px;width: 40px;text-align: center;line-height: 40px;border-radius: 50%;cursor: pointer;
}.container .search-submit:hover {background-color: rgba(255, 255, 255, .6);
}.container .search-btn {width: 100%;border: none;outline: none;text-align: center;background: inherit;font-size: 20px;transition: all .5s ease-in-out;
}.container .search-btn::placeholder {color: rgba(230, 230, 230, .9);text-shadow: 0 0 4px #000;transition: all .2s ease-in-out;
}.container:hover .search-btn::placeholder,
.container:focus-within .search-btn::placeholder {color: rgba(119, 119, 119, .9);text-shadow: 0 0 4px #f3f3f3;
}.background {position: absolute;top: 0;right: 0;bottom: 0;left: 0;background-image: url(./bg.jpg);background-repeat: no-repeat;background-size: cover;background-position: center;object-fit: cover;transition: all .2s ease-in-out;
}.container:focus-within ~ .background {filter: blur(20px);transform: scale(1.2);
}.poem {z-index: 1;position: absolute;top: 70%;color: #ddd;text-shadow: 0 0 2px #000;opacity: 0;transition: all .2s ease-in-out;padding: 12px 32px;border-radius: 8px;line-height: 2;
}.poem .author {opacity: 0;text-align: center;transition: all .2s ease-in-out;
}.container:focus-within ~ .poem {opacity: 1;
}.container:focus-within ~ .poem:hover {background-color: rgba(255, 255, 255, .3);opacity: 1;
}.container:focus-within ~ .poem:hover .author {opacity: 1;
}

实现 Document

@Data
public class Document {private Integer docId;private String title;private String url;private String content;private String desc;@Overridepublic String toString() {return String.format("Document{docId=%d, title=%s, url=%s}", docId, title, url);}
}

实现 DescBuilder类

@Slf4j
@Component
public class DescBuilder {public Document build(List<String> queryList, Document doc) {// 找到 content 中包含关键字的位置// query = "list"// content = "..... hello list go come do ...."// desc = "hello <i>list</i> go com..."String content = doc.getContent().toLowerCase();String word = "";int i = -1;for (String query : queryList) {i = content.indexOf(query);if (i != -1) {word = query;break;}}if (i == -1) {// 这里中情况如果出现了，说明咱的倒排索引建立的有问题log.error("docId = {} 中不包含 {}", doc.getDocId(), queryList);throw new RuntimeException();}// 前面截 120 个字，后边截 120 个字int from = i - 120;if (from < 0) {// 说明前面不够 120 个字了from = 0;}int to = i + 120;if (to > content.length()) {// 说明后面不够 120 个字了to = content.length();}String desc = content.substring(from, to);desc = desc.replace(word, "<i>" + word + "</i>");doc.setDesc(desc);return doc;}
}

五. 项目展示

页面展示

输入展示

搜索展示

视频演示

神马搜索

JavaSearch搜索引擎相关推荐

电子设计搜索引擎引入分析和见解
电子设计搜索引擎引入分析和见解 Electronics Design Search Engine Introduces Analytics and Insights 2020年上半年最受欢迎的组件是什 ...
搜索引擎ElasticSearchV5.4.2系列二之ElasticSearchV5.4.2+kibanaV5.4.2+x-packV5.4.2安装
相关博文: 搜索引擎ElasticSearchV5.4.2系列一之ES介绍搜索引擎ElasticSearchV5.4.2系列二之ElasticSearchV5.4.2+klanaV5.4.2+x-p ...
搜索引擎优化培训教程
很详细的搜索引擎优化培训教材 View more presentations from mysqlops 转载于:https://www.cnblogs.com/macleanoracle/archi ...
蜘蛛搜索引擎_各大搜索引擎的蜘蛛特点
我们在做SEO时,需要对各个搜索引擎的爬行蜘蛛有一个很好的了解认知,才能更好的去做好SEO优化,就如你要去谈业务,各大客户的一些身份信息需要了解一样. 下面每日学点SEO就给大家整理了各大搜索引擎蜘蛛 ...
php常用的搜索引擎,常用搜索引擎高级命令有哪些
一些常用的高级搜索引擎命令,包括以下: 1.Site 这个是最常见的高级搜索命令,作用是查询网站的收录情况,并且这个命令在所有的搜索引擎里是通用的.用法:site:www.aizhan.com 2.D ...
百度搜索引擎广告SEM调用架构示意图
下面是从百度计算广告学教程的一份ppt中摘取的几张图片,它们清晰地给出了搜索引擎广告的投放流程,以便参考. 参考文献 [1].百度搜索广告系统工程架构.ppt
搜索引擎技术之概要预览
搜索引擎技术之概要预览前言近些天在学校静心复习功课与梳理思路(找工作的事情暂缓),趁闲暇之际,常看有关搜索引擎相关技术类的文章,接触到不少此前未曾触碰到的诸多概念与技术,如爬虫,网页抓取,分词,索 ...
搜索引擎中的URL散列
散列(hash)也就是哈希,是信息存储和查询所用的一项基本技术.在搜索引擎中网络爬虫在抓取网页时为了对网页进行有效地排重必须对URL进行散列,这样才能快速地排除已经抓取过的网页.最理想的状态是对联网上 ...
ASP.NET 制作让搜索引擎可以友好访问的链接
作者:http://www.donews.net/lealting/archive/2004/03/31/9759.aspx 今天看了一篇文章,主要是讲,如何制作让搜索引擎可以友好访问的链接,大概的内 ...

JavaSearch搜索引擎

一. 认识搜索引擎

搜索设计的核心思路

二. 项目介绍

三. 实现索引模块

构建索引

环境搭建

实现 Indexer类

实现 FileScanner类

实现 IndexManager类

实现 IndexerProperties类

实现 Document类

数据库中创建表

实现 InvertedRecord类

四. 实现搜索模块

实现前端页面

实现 Document

实现 DescBuilder类

五. 项目展示

页面展示

输入展示

搜索展示

视频演示

JavaSearch搜索引擎相关推荐

最新文章

热门文章