进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验
尝鲜
GET /forum/article/_search { "query": { "match_phrase": { "title": { "query": "java spark", "slop": 1 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
slop(移动)的含义是什么?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop
slop实际移动举例
实际举例,一个query string经过几次移动之后可以匹配到一个document,然后设置slop
hello world, java is very good, spark is also very good.
java spark,match phrase,搜不到
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
java is very good spark is
java spark
java --> spark 移动一位
java --> spark 移动两位
java --> spark 移动三位
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。而是说,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "spark data",
"slop": 3
}
}
}
}
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的
做实验,验证slop的含义
实验一
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 3 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
实验二
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 2 } } } } 结果 { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
实验三
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 3 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.21824157, "hits": [ { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.21824157, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith" } } ] } } |
Spark is best big data solution based on scala ,an programming language similar to java spark
spark data
--> data 移动一位
--> data 移动两位
spark --> data 移动三位
实验四增强
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "data spark", "slop": 5 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.154366, "hits": [ { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.154366, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith" } } ] } } |
spark is best big data
data spark
--> data/spark 移动一位
spark àdata 移动两位
spark --> data 移动三位
spark --> data 移动四位
spark --> data 移动五位
slop搜索下,关键词离的越近,relevance score就会越高,做实验说明。。。
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1.3728157, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 1.3728157, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.5753642, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } }, { "_index": "forum", "_type": "article", "_id": "1", "_score": 0.28582606, "_source": { "articleID": "XHDK-A-1293-#fJ3", "userID": 1, "hidden": false, "postDate": "2017-01-01", "tag": [ "java", "hadoop" ], "tag_cnt": 2, "view_cnt": 30, "title": "this is java and elasticsearch blog", "content": "i like to write best elasticsearch article", "sub_title": "learning more courses", "author_first_name": "Peter", "author_last_name": "Smith", "new_author_last_name": "Smith", "new_author_first_name": "Peter" } } ] } } |
实验
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "java best", "slop": 15 } } } } 结果 { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.65380025, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 0.65380025, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.07111243, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } } ] } } |
其实,加了slop的phrase match,就是proximity match,近似匹配
1、java spark,短语,doc,phrase match
2、java spark,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
移动搜索的短语,以达到文档的内容
进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验相关推荐
- 白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析
文章目录 概述 官网 slop 含义 例子 示例一 示例二 示例三 概述 继续跟中华石杉老师学习ES,第18篇 课程地址: https://www.roncoo.com/view/55 接上篇博客 白 ...
- 22_深度探秘搜索技术_手动控制全文检索(match)结果的精准度、基于boost的细粒度搜索条件实现权重控制...
本文章收录于[Elasticsearch 系列],将详细的讲解 Elasticsearch 整个大体系,包括但不限于ELK讲解.ES调优.海量数据处理等 本博客以例子为主线,来说明在elasticse ...
- 白话Elasticsearch11-深度探秘搜索技术之基于tie_breaker参数优化dis_max搜索效果
文章目录 概述 官方文档 例子 tie_breaker 概述 继续跟中华石杉老师学习ES,第十一篇 课程地址: https://www.roncoo.com/view/55 官方文档 https:// ...
- 白话Elasticsearch20-深度探秘搜索技术之使用rescoring机制优化近似匹配搜索的性能
文章目录 概述 官网 match和phrase match(proximity match)区别 优化proximity match的性能 概述 继续跟中华石杉老师学习ES,第19篇 课程地址: ht ...
- 23_深度探秘搜索技术_best fields策略的dis_max、tie_breaker参数以及multi_match语法
目录 一.引入dis_max 实现best fields 的必要性 1.使用bulk批量添加测试数据 2.搜索title或content中包含java或solution的帖子 3.结果分析 二.bes ...
- Elasticsearch深度探秘搜索技术如何手动控制全文检索结果的精准度
为帖子数据增加标题字段 #插入数据 POST /post/_doc/_bulk { "update": { "_id": "1"} } { ...
- Elasticsearch深度探秘搜索技术基于multi_match语法实现dis_max+tie_breaker
直接上代码 GET /post/_search {"query": {"multi_match": {"query": "java ...
- 白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端
文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十四篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...
- 白话Elasticsearch13-深度探秘搜索技术之基于multi_match+most fields策略进行multi-field搜索
文章目录 概述 官网 示例 构造模拟数据 普通查询 使用 multi_match + most fileds查询 best fields VS most fields 概述 继续跟中华石杉老师学习ES ...
- 白话Elasticsearch12-深度探秘搜索技术之基于multi_match + best fields语法实现dis_max+tie_breaker
文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十二篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...
最新文章
- git更新代码报错,error: The following untracked working tree files would be overwritten by ch
- python ggplot为什么不能取代matplotlib_Matplotlib vs ggplot2
- 牛客练习赛33 D	tokitsukaze and Inverse Number (树状数组求逆序对,结论)
- 二分逼近二分查找 高效解析800万大数据之区域分布
- 一文总结:抽象类(abstract)与接口(interface)的特点和代码展示
- xshell执行结果到文本_xshell拷贝文件到本地
- MySQL 的发展历史和版本分支:
- 计算机网络(十六)-轮询访问介质访问控制
- PHP未来码支付V1.3网站源码开源版
- everything html修改,在HTML 5视频标签上更改源
- html5作品分析报告,性能报告之HTML5 性能测试报告
- 使用OWA无法撰写邮件内容的解决法
- 如何用excel批量生成word文档,并且命名?
- SAM2195和SAM2695 和SAM5704硬音源设备在三四十年前MIDI技术刚刚起步之时
- The Simplest Classifier: Histogram Comparison (最简单的分类器:直方图比较)
- 论文笔记(五)FWENet:基于SAR图像的洪水水体提取深度卷积神经网络(CVPR)
- Java 39---Hibernate框架(2)
- 搭建即可运营的秒收录导航网源码带广告管理完美运营版
- 关于银环蛇Z370主板的,M.2固态与SATA接口冲突的解决办法
- android opengl 百度地图,androidsdk | 百度地图API SDK
热门文章
- DirectX天空球和天空盒子模型
- 生活大爆炸第六季 那些精妙的台词翻译
- 小米8青春版android版本,小米8青春版和小米8什么区别 小米8青春版和小米8对比...
- 闰年和平年的区别python_利用Python实现图书超期提醒
- 《李尔王》:重拾李尔王的话语权力
- linux resolv.conf 重启,Ubuntu关于修改resolv.conf重启失效的问题
- 分散层叠(Fractional Cascading)
- 时钟系统:时钟系统倍频分频配置--时钟系统分析案例
- Python简单实现人脸识别检测, 对某平台美女主播照片进行评分排名
- TTL传输中过期问题导致网站打不开