接着上一篇文章;我们继续来研究es的复杂检索
文章目录
- (1) bool用来做复合查询
- (2)Filter【结果过滤】
- (3)term
- (4)Aggregation(执行聚合)
(1) bool用来做复合查询
复合语句可以合并,任何其他查询语句,包括符合语句。这也就意味着,复合语句之间
可以互相嵌套,可以表达非常复杂的逻辑。
must:必须达到must所列举的所有条件
GET bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"address":"mill"}},
{"match":{"gender":"M"}}
]
}
}
}
must_not,必须不匹配must_not所列举的所有条件。
should,应该满足should所列举的条件。
实例:查询gender=m,并且address=mill的数据
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
]
}
}
}
查询结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 6.0824604,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 6.0824604,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 6.0824604,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 6.0824604,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "parkerhines@baluba.com",
"city" : "Blackgum",
"state" : "KY"
}
}
]
}
}
must_not:必须不是指定的情况
实例:查询gender=m,并且address=mill的数据,但是age不等于38的
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "38"
}
}
]
}
}
查询结果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.0824604,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 6.0824604,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
}
]
}
}
should:应该达到should列举的条件,如果到达会增加相关文档的评分,并不会改变查询的结果。如果query中只有should且只有一种匹配规则,那么should的条件就会被作为默认匹配条件二区改变查询结果。
实例:匹配lastName应该等于Wallace的数据
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Wallace"
}
}
]
}
}
}
查询结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 12.585751,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 12.585751,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 6.0824604,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 6.0824604,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "parkerhines@baluba.com",
"city" : "Blackgum",
"state" : "KY"
}
}
]
}
}
能够看到相关度越高,得分也越高。
(2)Filter【结果过滤】
并不是所有的查询都需要产生分数,特别是哪些仅用于filtering过滤的文档。为了不计算分数,elasticsearch会自动检查场景并且优化查询的执行。
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "mill"
}
}
],
"filter": {
"range": {
"balance": {
"gte": "10000",
"lte": "20000"
}
}
}
}
}
}
这里先是查询所有匹配address=mill的文档,然后再根据10000<=balance<=20000进行过滤查询结果
查询结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
}
]
}
}
Each must
, should
, and must_not
element in a Boolean query is referred to as a query clause. How well a document meets the criteria in each must
or should
clause contributes to the document’s relevance score. The higher the score, the better the document matches your search criteria. By default, Elasticsearch returns documents ranked by these relevance scores.
在boolean查询中,must
, should
和must_not
元素都被称为查询子句 。 文档是否符合每个“must”或“should”子句中的标准,决定了文档的“相关性得分”。 得分越高,文档越符合您的搜索条件。 默认情况下,Elasticsearch返回根据这些相关性得分排序的文档。
The criteria in a must_not
clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. You can also explicitly specify arbitrary filters to include or exclude documents based on structured data.
“must_not”子句中的条件被视为“过滤器”。
它影响文档是否包含在结果中, 但不影响文档的评分方式。 还可以显式地指定任意过滤器来包含或排除基于结构化数据的文档。
filter在使用过程中,并不会计算相关性得分_score:
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "mill"
}
}
],
"filter": {
"range": {
"balance": {
"gte": "10000",
"lte": "20000"
}
}
}
}
}
}
//gte:>= lte:<=
查询结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 213,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "20",
"_score" : 0.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "37",
"_score" : 0.0,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "mcgeemooney@reversus.com",
"city" : "Tooleville",
"state" : "OK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "51",
"_score" : 0.0,
"_source" : {
"account_number" : 51,
"balance" : 14097,
"firstname" : "Burton",
"lastname" : "Meyers",
"age" : 31,
"gender" : "F",
"address" : "334 River Street",
"employer" : "Bezal",
"email" : "burtonmeyers@bezal.com",
"city" : "Jacksonburg",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "56",
"_score" : 0.0,
"_source" : {
"account_number" : 56,
"balance" : 14992,
"firstname" : "Josie",
"lastname" : "Nelson",
"age" : 32,
"gender" : "M",
"address" : "857 Tabor Court",
"employer" : "Emtrac",
"email" : "josienelson@emtrac.com",
"city" : "Sunnyside",
"state" : "UT"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "121",
"_score" : 0.0,
"_source" : {
"account_number" : 121,
"balance" : 19594,
"firstname" : "Acevedo",
"lastname" : "Dorsey",
"age" : 32,
"gender" : "M",
"address" : "479 Nova Court",
"employer" : "Netropic",
"email" : "acevedodorsey@netropic.com",
"city" : "Islandia",
"state" : "CT"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "176",
"_score" : 0.0,
"_source" : {
"account_number" : 176,
"balance" : 18607,
"firstname" : "Kemp",
"lastname" : "Walters",
"age" : 28,
"gender" : "F",
"address" : "906 Howard Avenue",
"employer" : "Eyewax",
"email" : "kempwalters@eyewax.com",
"city" : "Why",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "183",
"_score" : 0.0,
"_source" : {
"account_number" : 183,
"balance" : 14223,
"firstname" : "Hudson",
"lastname" : "English",
"age" : 26,
"gender" : "F",
"address" : "823 Herkimer Place",
"employer" : "Xinware",
"email" : "hudsonenglish@xinware.com",
"city" : "Robbins",
"state" : "ND"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "222",
"_score" : 0.0,
"_source" : {
"account_number" : 222,
"balance" : 14764,
"firstname" : "Rachelle",
"lastname" : "Rice",
"age" : 36,
"gender" : "M",
"address" : "333 Narrows Avenue",
"employer" : "Enaut",
"email" : "rachellerice@enaut.com",
"city" : "Wright",
"state" : "AZ"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "227",
"_score" : 0.0,
"_source" : {
"account_number" : 227,
"balance" : 19780,
"firstname" : "Coleman",
"lastname" : "Berg",
"age" : 22,
"gender" : "M",
"address" : "776 Little Street",
"employer" : "Exoteric",
"email" : "colemanberg@exoteric.com",
"city" : "Eagleville",
"state" : "WV"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "272",
"_score" : 0.0,
"_source" : {
"account_number" : 272,
"balance" : 19253,
"firstname" : "Lilly",
"lastname" : "Morgan",
"age" : 25,
"gender" : "F",
"address" : "689 Fleet Street",
"employer" : "Biolive",
"email" : "lillymorgan@biolive.com",
"city" : "Sunbury",
"state" : "OH"
}
}
]
}
}
能看到所有文档的 “_score” : 0.0。
(3)term
和match一样。匹配某个属性的值。全文检索字段用match,其他非text字段匹配用term。
Avoid using the
term
query fortext
fields.避免对文本字段使用“term”查询
By default, Elasticsearch changes the values of
text
fields as part of analysis. This can make finding exact matches fortext
field values difficult.默认情况下,Elasticsearch作为analysis的一部分更改’ text '字段的值。这使得为“text”字段值寻找精确匹配变得困难。
To search
text
field values, use the match.要搜索“text”字段值,请使用匹配。
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/query-dsl-term-query.html
使用term匹配查询
GET bank/_search
{
"query": {
"term": {
"age": "28"
}
}
}
如果是text则查不到:
GET bank/_search
{
"query": {
"term": {
"gender" : "F"
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
一条也没有匹配到
而更换为match匹配时,能够匹配到32个文档
也就是说,全文检索字段用match,其他非text字段匹配用term。
(4)Aggregation(执行聚合)
聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。在elasticsearch中,执行搜索返回this(命中结果),并且同时返回聚合结果,把以响应中的所有hits(命中结果)分隔开的能力。这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和简化的API啦避免网络往返。
“size”:0
size:0不显示搜索数据
aggs:执行聚合。聚合语法如下:
"aggs":{
"aggs_name这次聚合的名字,方便展示在结果集中":{
"AGG_TYPE聚合的类型(avg,term,terms)":{}
}
},
搜索address中包含mill的所有人的年龄分布以及平均年龄,但不显示这些人的详情
GET bank/_search
{
"query": {
"match": {
"address": "Mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
},
"size": 0
}
//ageAgg:聚合名字 terms:聚合类型 "field": "age":按照age字段聚合 size:10:取出前十种age
//avg:平均值聚合类型
//不显示这些人的详情,只看聚合结果
查询结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"ageAvg" : {
"value" : 34.0
},
"balanceAvg" : {
"value" : 25208.0
}
}
}
复杂:
按照年龄聚合,并且求这些年龄段的这些人的平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"ageAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
输出结果:
{
"took" : 49,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"ageAvg" : {
"value" : 28312.918032786885
}
},
{
"key" : 39,
"doc_count" : 60,
"ageAvg" : {
"value" : 25269.583333333332
}
},
{
"key" : 26,
"doc_count" : 59,
"ageAvg" : {
"value" : 23194.813559322032
}
},
{
"key" : 32,
"doc_count" : 52,
"ageAvg" : {
"value" : 23951.346153846152
}
},
{
"key" : 35,
"doc_count" : 52,
"ageAvg" : {
"value" : 22136.69230769231
}
},
{
"key" : 36,
"doc_count" : 52,
"ageAvg" : {
"value" : 22174.71153846154
}
},
{
"key" : 22,
"doc_count" : 51,
"ageAvg" : {
"value" : 24731.07843137255
}
},
{
"key" : 28,
"doc_count" : 51,
"ageAvg" : {
"value" : 28273.882352941175
}
},
{
"key" : 33,
"doc_count" : 50,
"ageAvg" : {
"value" : 25093.94
}
},
{
"key" : 34,
"doc_count" : 49,
"ageAvg" : {
"value" : 26809.95918367347
}
},
{
"key" : 30,
"doc_count" : 47,
"ageAvg" : {
"value" : 22841.106382978724
}
},
{
"key" : 21,
"doc_count" : 46,
"ageAvg" : {
"value" : 26981.434782608696
}
},
{
"key" : 40,
"doc_count" : 45,
"ageAvg" : {
"value" : 27183.17777777778
}
},
{
"key" : 20,
"doc_count" : 44,
"ageAvg" : {
"value" : 27741.227272727272
}
},
{
"key" : 23,
"doc_count" : 42,
"ageAvg" : {
"value" : 27314.214285714286
}
},
{
"key" : 24,
"doc_count" : 42,
"ageAvg" : {
"value" : 28519.04761904762
}
},
{
"key" : 25,
"doc_count" : 42,
"ageAvg" : {
"value" : 27445.214285714286
}
},
{
"key" : 37,
"doc_count" : 42,
"ageAvg" : {
"value" : 27022.261904761905
}
},
{
"key" : 27,
"doc_count" : 39,
"ageAvg" : {
"value" : 21471.871794871793
}
},
{
"key" : 38,
"doc_count" : 39,
"ageAvg" : {
"value" : 26187.17948717949
}
},
{
"key" : 29,
"doc_count" : 35,
"ageAvg" : {
"value" : 29483.14285714286
}
}
]
}
}
}
查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"ageBalanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
//"field": "gender.keyword" gender是txt没法聚合 必须加.keyword精确替代
输出结果:
{
"took" : 119,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
},
"ageBalanceAvg" : {
"value" : 28312.918032786885
}
}
]
.......//省略其他
}
}
}