如何解决 ES 复杂聚合排序问题(嵌套桶排序)?
我目前是初学ES,想将其应用在实数据分析的项目中,所以进行了项目中常见的复杂聚合操作,但是我目前在排序上遇到了困难,各种度娘谷歌够未能解决。
具体场景如下:
索引 visited_data 存放的是用户访问数据
visitor_id 是访客 ID
session_id 是访问会话 ID
request_id 是会话单次请求 ID
想要统计出每个访客单次会话请求次数最大的数据,并将客户数据按照会话最大请求次数进行正序排列
目前按照这个需求写出了如下查询(并未包含后面的排序):
{
"query": {
"range": {
"visitor_id": {
"gte": 35,
"lte": 36
}
}
},
"size": 0,
"aggs": {
"visitor": {
"terms": {
"size": 10,
"field": "visitor_id"
},
"aggs": {
"sessions": {
"terms": {
"field": "session_id",
"size": 1,
"order": {
"_count": "desc"
}
}
},
"maxc": {
"max_bucket": {
"buckets_path": "sessions>_count"
}
}
}
}
}
}
查询结果如下:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 212,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"visitor": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 35,
"doc_count": 203,
"sessions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 184,
"buckets": [
{
"key": 2364742,
"doc_count": 19
}
]
},
"maxc": {
"value": 19,
"keys": [
"2364742"
]
}
},
{
"key": 36,
"doc_count": 9,
"sessions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 39,
"doc_count": 9
}
]
},
"maxc": {
"value": 9,
"keys": [
"39"
]
}
}
]
}
}
}
但是我无法将排序条件增加进去,因为 session 是用 terms 分组聚合得到的多桶,我无法像这样得到想要的排序结果
"terms": {
"size": 10,
"field": "visitor_id",
"order": {
"sessions": "asc"
}
},
并且尝试使用了 max_bucket
方法,但是此方式得到的结果无法参与排序
"maxc": {
"max_bucket": {
"buckets_path": "sessions>_count"
}
}
我是这样写的:
"terms": {
"size": 10,
"field": "visitor_id",
"order": {
"maxc": "asc"
}
},
最后执行会出现错误:
{
"error": {
"root_cause": [
{
"type": "aggregation_execution_exception",
"reason": "Invalid aggregator order path [maxc]. The provided aggregation [maxc] either does not exist, or is a pipeline aggregation and cannot be used to sort the buckets."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "visited_data",
"node": "xSFlOYFwRNuOQYv7iUrK3g",
"reason": {
"type": "aggregation_execution_exception",
"reason": "Invalid aggregator order path [maxc]. The provided aggregation [maxc] either does not exist, or is a pipeline aggregation and cannot be used to sort the buckets."
}
}
]
},
"status": 500
}
我目前还没有找到有效的能解决我这个问题的方法,还望大佬们能够给我提供一下解决这个问题的方案或者思路。
我分筒分了三层 : 第一层:用户、第二层:会话 、第三层:点击数 如果按照你的想法 是无法实现的
我的想法就是就是 在第二层里新增个字段 max_request 最大点击数 第一层通过这个字段排序