尝鲜

GET /forum/article/_search

{

"query": {

"match_phrase": {

"title": {

"query": "java spark",

"slop":  1

}

}

}

}

结果:

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

slop(移动)的含义是什么?

query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop

slop实际移动举例

实际举例,一个query string经过几次移动之后可以匹配到一个document,然后设置slop

hello world, java is very good, spark is also very good.

java spark,match phrase,搜不到

如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配

java           is               very           good         spark         is

java     spark

java        -->        spark                      移动一位

java            -->                        spark             移动两位

java             -->                          spark   移动三位

这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了

slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。而是说,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上

slop,设置的是3,那么就ok

GET /forum/article/_search

{

"query": {

"match_phrase": {

"title": {

"query": "spark data",

"slop":  3

}

}

}

}

就可以把刚才那个doc匹配上,那个doc会作为结果返回

但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的

做实验,验证slop的含义

实验一

GET /forum/article/_search

{

"query": {

"match_phrase": {

"content": {

"query": "spark data",

"slop": 3

}

}

}

}

结果:

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

实验二

GET /forum/article/_search

{

"query": {

"match_phrase": {

"content": {

"query": "spark data",

"slop": 2

}

}

}

}

结果

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

实验三

GET /forum/article/_search

{

"query": {

"match_phrase": {

"content": {

"query": "spark data",

"slop": 3

}

}

}

}

结果:

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 1,

"max_score": 0.21824157,

"hits": [

{

"_index": "forum",

"_type": "article",

"_id": "5",

"_score": 0.21824157,

"_source": {

"articleID": "DHJK-B-1395-#Ky5",

"userID": 3,

"hidden": false,

"postDate": "2017-03-01",

"tag": [

"elasticsearch"

],

"tag_cnt": 1,

"view_cnt": 10,

"title": "this is spark blog",

"content": "spark is best big data solution based on scala ,an programming language similar to java spark",

"sub_title": "haha, hello world",

"author_first_name": "Tonny",

"author_last_name": "Peter Smith"

}

}

]

}

}

Spark  is  best  big  data  solution based on scala ,an programming language similar to java spark

spark data

--> data  移动一位

-->  data  移动两位

spark               -->   data  移动三位

实验四增强

GET /forum/article/_search

{

"query": {

"match_phrase": {

"content": {

"query": "data spark",

"slop": 5

}

}

}

}

结果:

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 1,

"max_score": 0.154366,

"hits": [

{

"_index": "forum",

"_type": "article",

"_id": "5",

"_score": 0.154366,

"_source": {

"articleID": "DHJK-B-1395-#Ky5",

"userID": 3,

"hidden": false,

"postDate": "2017-03-01",

"tag": [

"elasticsearch"

],

"tag_cnt": 1,

"view_cnt": 10,

"title": "this is spark blog",

"content": "spark is best big data solution based on scala ,an programming language similar to java spark",

"sub_title": "haha, hello world",

"author_first_name": "Tonny",

"author_last_name": "Peter Smith"

}

}

]

}

}

spark             is                          best        big                data

data          spark

-->               data/spark   移动一位

spark          àdata     移动两位

spark             -->                      data     移动三位

spark                                         -->               data    移动四位

spark                                                              -->               data    移动五位

slop搜索下,关键词离的越近,relevance score就会越高,做实验说明。。。

{

"took": 4,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 3,

"max_score": 1.3728157,

"hits": [

{

"_index": "forum",

"_type": "article",

"_id": "2",

"_score": 1.3728157,

"_source": {

"articleID": "KDKE-B-9947-#kL5",

"userID": 1,

"hidden": false,

"postDate": "2017-01-02",

"tag": [

"java"

],

"tag_cnt": 1,

"view_cnt": 50,

"title": "this is java blog",

"content": "i think java is the best programming language",

"sub_title": "learned a lot of course",

"author_first_name": "Smith",

"author_last_name": "Williams",

"new_author_last_name": "Williams",

"new_author_first_name": "Smith"

}

},

{

"_index": "forum",

"_type": "article",

"_id": "5",

"_score": 0.5753642,

"_source": {

"articleID": "DHJK-B-1395-#Ky5",

"userID": 3,

"hidden": false,

"postDate": "2017-03-01",

"tag": [

"elasticsearch"

],

"tag_cnt": 1,

"view_cnt": 10,

"title": "this is spark blog",

"content": "spark is best big data solution based on scala ,an programming language similar to java spark",

"sub_title": "haha, hello world",

"author_first_name": "Tonny",

"author_last_name": "Peter Smith",

"new_author_last_name": "Peter Smith",

"new_author_first_name": "Tonny"

}

},

{

"_index": "forum",

"_type": "article",

"_id": "1",

"_score": 0.28582606,

"_source": {

"articleID": "XHDK-A-1293-#fJ3",

"userID": 1,

"hidden": false,

"postDate": "2017-01-01",

"tag": [

"java",

"hadoop"

],

"tag_cnt": 2,

"view_cnt": 30,

"title": "this is java and elasticsearch blog",

"content": "i like to write best elasticsearch article",

"sub_title": "learning more courses",

"author_first_name": "Peter",

"author_last_name": "Smith",

"new_author_last_name": "Smith",

"new_author_first_name": "Peter"

}

}

]

}

}

实验

GET /forum/article/_search

{

"query": {

"match_phrase": {

"content": {

"query": "java best",

"slop": 15

}

}

}

}

结果

{

"took": 3,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 2,

"max_score": 0.65380025,

"hits": [

{

"_index": "forum",

"_type": "article",

"_id": "2",

"_score": 0.65380025,

"_source": {

"articleID": "KDKE-B-9947-#kL5",

"userID": 1,

"hidden": false,

"postDate": "2017-01-02",

"tag": [

"java"

],

"tag_cnt": 1,

"view_cnt": 50,

"title": "this is java blog",

"content": "i think java is the best programming language",

"sub_title": "learned a lot of course",

"author_first_name": "Smith",

"author_last_name": "Williams",

"new_author_last_name": "Williams",

"new_author_first_name": "Smith"

}

},

{

"_index": "forum",

"_type": "article",

"_id": "5",

"_score": 0.07111243,

"_source": {

"articleID": "DHJK-B-1395-#Ky5",

"userID": 3,

"hidden": false,

"postDate": "2017-03-01",

"tag": [

"elasticsearch"

],

"tag_cnt": 1,

"view_cnt": 10,

"title": "this is spark blog",

"content": "spark is best big data solution based on scala ,an programming language similar to java spark",

"sub_title": "haha, hello world",

"author_first_name": "Tonny",

"author_last_name": "Peter Smith",

"new_author_last_name": "Peter Smith",

"new_author_first_name": "Tonny"

}

}

]

}

}

其实,加了slop的phrase match,就是proximity match,近似匹配

1、java spark,短语,doc,phrase match

2、java spark,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match

移动搜索的短语,以达到文档的内容

进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验相关推荐

  1. 白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析

    文章目录 概述 官网 slop 含义 例子 示例一 示例二 示例三 概述 继续跟中华石杉老师学习ES,第18篇 课程地址: https://www.roncoo.com/view/55 接上篇博客 白 ...

  2. 22_深度探秘搜索技术_手动控制全文检索(match)结果的精准度、基于boost的细粒度搜索条件实现权重控制...

    本文章收录于[Elasticsearch 系列],将详细的讲解 Elasticsearch 整个大体系,包括但不限于ELK讲解.ES调优.海量数据处理等 本博客以例子为主线,来说明在elasticse ...

  3. 白话Elasticsearch11-深度探秘搜索技术之基于tie_breaker参数优化dis_max搜索效果

    文章目录 概述 官方文档 例子 tie_breaker 概述 继续跟中华石杉老师学习ES,第十一篇 课程地址: https://www.roncoo.com/view/55 官方文档 https:// ...

  4. 白话Elasticsearch20-深度探秘搜索技术之使用rescoring机制优化近似匹配搜索的性能

    文章目录 概述 官网 match和phrase match(proximity match)区别 优化proximity match的性能 概述 继续跟中华石杉老师学习ES,第19篇 课程地址: ht ...

  5. 23_深度探秘搜索技术_best fields策略的dis_max、tie_breaker参数以及multi_match语法

    目录 一.引入dis_max 实现best fields 的必要性 1.使用bulk批量添加测试数据 2.搜索title或content中包含java或solution的帖子 3.结果分析 二.bes ...

  6. Elasticsearch深度探秘搜索技术如何手动控制全文检索结果的精准度

    为帖子数据增加标题字段 #插入数据 POST /post/_doc/_bulk { "update": { "_id": "1"} } { ...

  7. Elasticsearch深度探秘搜索技术基于multi_match语法实现dis_max+tie_breaker

    直接上代码 GET /post/_search {"query": {"multi_match": {"query": "java ...

  8. 白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端

    文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十四篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...

  9. 白话Elasticsearch13-深度探秘搜索技术之基于multi_match+most fields策略进行multi-field搜索

    文章目录 概述 官网 示例 构造模拟数据 普通查询 使用 multi_match + most fileds查询 best fields VS most fields 概述 继续跟中华石杉老师学习ES ...

  10. 白话Elasticsearch12-深度探秘搜索技术之基于multi_match + best fields语法实现dis_max+tie_breaker

    文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十二篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...

最新文章

  1. git更新代码报错,error: The following untracked working tree files would be overwritten by ch
  2. python ggplot为什么不能取代matplotlib_Matplotlib vs ggplot2
  3. 牛客练习赛33 D tokitsukaze and Inverse Number (树状数组求逆序对,结论)
  4. 二分逼近二分查找 高效解析800万大数据之区域分布
  5. 一文总结:抽象类(abstract)与接口(interface)的特点和代码展示
  6. xshell执行结果到文本_xshell拷贝文件到本地
  7. MySQL 的发展历史和版本分支:
  8. 计算机网络(十六)-轮询访问介质访问控制
  9. PHP未来码支付V1.3网站源码开源版
  10. everything html修改,在HTML 5视频标签上更改源
  11. html5作品分析报告,性能报告之HTML5 性能测试报告
  12. 使用OWA无法撰写邮件内容的解决法
  13. 如何用excel批量生成word文档,并且命名?
  14. SAM2195和SAM2695 和SAM5704硬音源设备在三四十年前MIDI技术刚刚起步之时
  15. The Simplest Classifier: Histogram Comparison (最简单的分类器:直方图比较)
  16. 论文笔记(五)FWENet:基于SAR图像的洪水水体提取深度卷积神经网络(CVPR)
  17. Java 39---Hibernate框架(2)
  18. 搭建即可运营的秒收录导航网源码带广告管理完美运营版
  19. 关于银环蛇Z370主板的,M.2固态与SATA接口冲突的解决办法
  20. android opengl 百度地图,androidsdk | 百度地图API SDK

热门文章

  1. DirectX天空球和天空盒子模型
  2. 生活大爆炸第六季 那些精妙的台词翻译
  3. 小米8青春版android版本,小米8青春版和小米8什么区别 小米8青春版和小米8对比...
  4. 闰年和平年的区别python_利用Python实现图书超期提醒
  5. 《李尔王》:重拾李尔王的话语权力
  6. linux resolv.conf 重启,Ubuntu关于修改resolv.conf重启失效的问题
  7. 分散层叠(Fractional Cascading)
  8. 时钟系统:时钟系统倍频分频配置--时钟系统分析案例
  9. Python简单实现人脸识别检测, 对某平台美女主播照片进行评分排名
  10. TTL传输中过期问题导致网站打不开