如何解决 ES 复杂聚合排序问题（嵌套桶排序）？

问答 / 127 / 3 / 创建于 4年前 / 更新于 4年前

我目前是初学ES，想将其应用在实数据分析的项目中，所以进行了项目中常见的复杂聚合操作，但是我目前在排序上遇到了困难，各种度娘谷歌够未能解决。
具体场景如下：
索引 visited_data 存放的是用户访问数据
visitor_id 是访客 ID
session_id 是访问会话 ID
request_id 是会话单次请求 ID
想要统计出每个访客单次会话请求次数最大的数据，并将客户数据按照会话最大请求次数进行正序排列
目前按照这个需求写出了如下查询（并未包含后面的排序)：

{
    "query": {
        "range": {
            "visitor_id": {
                "gte": 35,
                "lte": 36
            }
        }
    },
    "size": 0,
    "aggs": {
        "visitor": {
            "terms": {
                "size": 10,
                "field": "visitor_id"
            },
            "aggs": {
                "sessions": {
                    "terms": {
                        "field": "session_id",
                        "size": 1,
                        "order": {
                            "_count": "desc"
                        }
                    }
                },
                "maxc": {
                    "max_bucket": {
                        "buckets_path": "sessions>_count"
                    }
                }
            }
        }
    }
}

查询结果如下：

{
    "took": 10,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 212,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "visitor": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 35,
                    "doc_count": 203,
                    "sessions": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 184,
                        "buckets": [
                            {
                                "key": 2364742,
                                "doc_count": 19
                            }
                        ]
                    },
                    "maxc": {
                        "value": 19,
                        "keys": [
                            "2364742"
                        ]
                    }
                },
                {
                    "key": 36,
                    "doc_count": 9,
                    "sessions": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": 39,
                                "doc_count": 9
                            }
                        ]
                    },
                    "maxc": {
                        "value": 9,
                        "keys": [
                            "39"
                        ]
                    }
                }
            ]
        }
    }
}

但是我无法将排序条件增加进去，因为 session 是用 terms 分组聚合得到的多桶，我无法像这样得到想要的排序结果

"terms": {
    "size": 10,
    "field": "visitor_id",
    "order": {
        "sessions": "asc"
    }
},

并且尝试使用了 max_bucket 方法，但是此方式得到的结果无法参与排序

"maxc": {
    "max_bucket": {
        "buckets_path": "sessions>_count"
    }
}

我是这样写的：

"terms": {
    "size": 10,
    "field": "visitor_id",
    "order": {
        "maxc": "asc"
    }
},

最后执行会出现错误：

{
    "error": {
        "root_cause": [
            {
                "type": "aggregation_execution_exception",
                "reason": "Invalid aggregator order path [maxc]. The provided aggregation [maxc] either does not exist, or is a pipeline aggregation and cannot be used to sort the buckets."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "visited_data",
                "node": "xSFlOYFwRNuOQYv7iUrK3g",
                "reason": {
                    "type": "aggregation_execution_exception",
                    "reason": "Invalid aggregator order path [maxc]. The provided aggregation [maxc] either does not exist, or is a pipeline aggregation and cannot be used to sort the buckets."
                }
            }
        ]
    },
    "status": 500
}

我目前还没有找到有效的能解决我这个问题的方法，还望大佬们能够给我提供一下解决这个问题的方案或者思路。

i-am-king

课程读者 63 声望

暂无个人描述~

0 人点赞

CrazyZard

版主 1.3k 声望 / 程序猿 @ 西湖心辰

最佳答案

我分筒分了三层：第一层：用户、第二层：会话、第三层：点击数如果按照你的想法是无法实现的

file

我的想法就是就是在第二层里新增个字段 max_request 最大点击数第一层通过这个字段排序

4年前评论

i-am-king （楼主）

也就是还是需要在业务代码上进一步处理下，在第二层增加一个 max_request 字段来单独记录最大点击数吧？

i-am-king （楼主）

感谢提的供思路，我再根据这思路想想重新设计下我的开发方案:rose:

讨论数量: 3

i-am-king

课程读者 63 声望

Elasticsearch 社区默认列表中看不到我这个问题，只能在此手动 @CrazyZard 一下大佬，打扰了

4年前评论

CrazyZard

能否给下你的mapping 或者的几条初始数据

i-am-king

课程读者 63 声望

"mappings": {
    "_doc": {
        "properties": {
            "visitor_id": {
                "type": "long"
            },
            "request_id": {
                "type": "long"
            },
            "session_id": {
                "type": "long"
            },
            "visitor_rid": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "ignore_above": 256,
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

[
    [
        35,
        10209738,
        2364741,
        "d8284e3126eebbdbbe666e1546c9a2d7"
    ],
    [
        35,
        10209739,
        2364742,
        "f0424a304951af5ffba88ed3732678f6"
    ],
    [
        972,
        226910,
        49501,
        "7efedcde9c2d9cac29610a794ef749e6"
    ],
    [
        972,
        226911,
        49501,
        "0af00cc3a02f09fbfcec69bfadc52271"
    ]
]

@CrazyZard 好的，以上是将字段精简后的信息

4年前评论

CrazyZard

管道聚合，不能用于对桶进行排序我晚点看能不能给你写下

CrazyZard

版主 1.3k 声望 / 程序猿 @ 西湖心辰

我分筒分了三层：第一层：用户、第二层：会话、第三层：点击数如果按照你的想法是无法实现的

file

我的想法就是就是在第二层里新增个字段 max_request 最大点击数第一层通过这个字段排序

4年前评论

i-am-king （楼主）

也就是还是需要在业务代码上进一步处理下，在第二层增加一个 max_request 字段来单独记录最大点击数吧？

i-am-king （楼主）

感谢提的供思路，我再根据这思路想想重新设计下我的开发方案:rose:

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

如何解决 ES 复杂聚合排序问题（嵌套桶排序）？

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

如何解决 ES 复杂聚合排序问题（嵌套桶排序）？

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录