笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

Bucket & Metric Aggregation

  • Metric 一些系列的统计方法
  • Bucket 一组满足条件的文档

ES 笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

Aggregation 的语法

  • Aggregation属于Search 的一部分。一般情况下,建议将其Size指定为0

ES 笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

例子

ES 笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

Mertric Aggregation

  • 单值分析:只输出一个分析结果
    • min,max,avg,sum
    • Cardinality(类似 distinct Count)
  • 多值分析:输出多个分析结果
    • stats ,extended stats
    • percentile, percentile rank
    • top hits (排在前面的示例)

Metric 聚合的具体Demo

  • 查看最低工资
  • 查看最高工资
  • 一个聚合输出多个值
  • 一次查询包含多个聚合
    • 同时查看最低 最高 和平均工资
      PUT /employees/
      {
      "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "gender" : {
          "type" : "keyword"
        },
        "job" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 50
            }
          }
        },
        "name" : {
          "type" : "keyword"
        },
        "salary" : {
          "type" : "integer"
        }
      }
      }
      }
      PUT /employees/_bulk
      { "index" : {  "_id" : "1" } }
      { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
      { "index" : {  "_id" : "2" } }
      { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
      { "index" : {  "_id" : "3" } }
      { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
      { "index" : {  "_id" : "4" } }
      { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
      { "index" : {  "_id" : "5" } }
      { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
      { "index" : {  "_id" : "6" } }
      { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
      { "index" : {  "_id" : "7" } }
      { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
      { "index" : {  "_id" : "8" } }
      { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
      { "index" : {  "_id" : "9" } }
      { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
      { "index" : {  "_id" : "10" } }
      { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
      { "index" : {  "_id" : "11" } }
      { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
      { "index" : {  "_id" : "12" } }
      { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
      { "index" : {  "_id" : "13" } }
      { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
      { "index" : {  "_id" : "14" } }
      { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
      { "index" : {  "_id" : "15" } }
      { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
      { "index" : {  "_id" : "16" } }
      { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "17" } }
      { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "18" } }
      { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
      { "index" : {  "_id" : "19" } }
      { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
      { "index" : {  "_id" : "20" } }
      { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
      //查询
      POST employees/_search
      {
      "size":0,
      "aggs": {
      "min": {
      "min": {
        "field": "salary"
      }
      },
      "max":{
      "max" :{
        "field": "salary"
      }
      },
      "avg":{
      "avg": {
        "field": "salary"
      }
      }
      }
      }
      //返回
      {
      "took" : 111,
      "timed_out" : false,
      "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
      },
      "hits" : {
      "total" : {
      "value" : 20,
      "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
      },
      "aggregations" : {
      "avg" : {
      "value" : 24700.0
      },
      "min" : {
      "value" : 9000.0
      },
      "max" : {
      "value" : 50000.0
      }
      }
      }
      # 一个聚合,输出多值
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "stats_salary": {
      "stats": {
        "field":"salary"
      }
      }
      }
      }
      //
      "aggregations" : {
      "stats_salary" : {
      "count" : 20,
      "min" : 9000.0,
      "max" : 50000.0,
      "avg" : 24700.0,
      "sum" : 494000.0
      }
      }

      Bucket

  • 按照一定的规则,将文档分配到不同的桶中,从而达到分类的目的。ES提供的一些常见的Bucket Aggregation
    • Term
    • 数字类型
      • Range 、Date Range
      • Histogram / Data Histogram
  • 支持嵌套:也就在桶里在做分桶

ES 笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

Terms Aggregation

  • 字段需要打开fielddata,才能进行Terms Aggregation
    • Keyword 默认支持doc_values
    • Text 需要在Mapping 中 enable ,会按照分词后的结果进行分
  • Demo
    • 对job 和 job.keyword 进行聚合
    • 对性别进行Terms聚合
    • 指定bucket size
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "jobs": {
      "terms": {
        "field":"job.keyword"
      }
      }
      }
      }
      //return 
      "aggregations" : {
      "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Programmer",
          "doc_count" : 7
        },
        {
          "key" : "Javascript Programmer",
          "doc_count" : 4
        },
        {
          "key" : "QA",
          "doc_count" : 3
        },
        {
          "key" : "DBA",
          "doc_count" : 2
        },
        {
          "key" : "Web Designer",
          "doc_count" : 2
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1
        },
        {
          "key" : "Product Manager",
          "doc_count" : 1
        }
      ]
      }
      }
      # 对 Text 字段打开 fielddata,支持terms aggregation
      PUT employees/_mapping
      {
      "properties" : {
        "job":{
           "type":     "text",
           "fielddata": true
        }
      }
      }
      # 对 Text 字段进行 terms 分词。分词后的terms
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "jobs": {
          "terms": {
            "field":"job"
          }
        }
      }
      }
      # 对job.keyword 和 job 进行 terms 聚合,分桶的总数并不一样
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "cardinate": {
          "cardinality": {
            "field": "job.keyword"
          }
        }
      }
      }
      # 对 性别的 keyword 进行聚合
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "gender": {
          "terms": {
            "field":"gender"
          }
        }
      }
      }

      Cardinality

  • 类似SQL中的Distinct

Bucket Size & Top Hists Demo

  • 应用场景:当后去分桶后,桶内最匹配的顶部文档列表
  • Size :按年龄分桶,找出指定数据量的分桶信息
  • Top Hits:查看各个工种中,年纪最大的3名员工
    #指定 bucket 的 size
    POST employees/_search
    {
        "size": 0,
        "aggs": {
          "ages_5": {
            "terms": {
              "field":"age",
              "size":3
            }
          }
        }
    }
    # 指定size,不同工种中,年纪最大的3个员工的具体信息
    POST employees/_search
    {
    "size": 0,
    "aggs": {
      "jobs": {
        "terms": {
          "field": "job.keyword"
        },
      "aggs":{
        "old_employee":{
          "top_hits": {
            "size": 3,
            "sort": [{
              "age": {
                "order": "desc"
              }
            }]
          }
        }
      }
      }
    }
    }

优化Terms聚合的性能

  • 在聚合经常发生,性能高的,索引不断写入
    ES 笔记三十八:Bucket & Metric 聚合分析及嵌套聚合

Range & Histogram

  • 按照数字的范围,进行分桶

  • 在Range Aggregation中,可以自定义Key

  • Demo:

    • 按照工资的Range 分桶

    • 按照工资的间隔(Histogram)分桶

      //Salary Ranges 分桶,可以自己定义 key
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "salary_range": {
      "range": {
        "field":"salary",
        "ranges":[
          {
            "to":10000
          },
          {
            "from":10000,
            "to":20000
          },
          {
            "key":">20000",
            "from":20000
          }
        ]
      }
      }
      }
      }
      //return 
      "aggregations" : {
      "salary_range" : {
      "buckets" : [
        {
          "key" : "*-10000.0",
          "to" : 10000.0,
          "doc_count" : 1
        },
        {
          "key" : "10000.0-20000.0",
          "from" : 10000.0,
          "to" : 20000.0,
          "doc_count" : 4
        },
        {
          "key" : ">20000",
          "from" : 20000.0,
          "doc_count" : 15
        }
      ]
      }
      }
      //Salary Histogram,工资0到10万,以 5000一个区间进行分桶
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "salary_histrogram": {
      "histogram": {
        "field":"salary",
        "interval":10000,
        "extended_bounds":{
          "min":0,
          "max":100000
      
        }
      }
      }
      }
      }

      Bucket + Metric Aggregation

  • Bucket 聚合分析允许通过添加子聚合分析进一步分析,子聚合分析可以是

    • Bucket
    • Metric
  • Demo

    • 按照工作类型进行分桶,并统计工资信息
    • 先按照工作类型分桶,然后按性别分桶,并统计工资信息
      # 嵌套聚合1,按照工作类型分桶,并统计工资信息
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "Job_salary_stats": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "salary": {
          "stats": {
            "field": "salary"
          }
        }
      }
      }
      }
      }
      # 多次嵌套。根据工作类型分桶,然后按照性别分桶,计算工资的统计信息
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "Job_gender_stats": {
          "terms": {
            "field": "job.keyword"
          },
          "aggs": {
            "gender_stats": {
              "terms": {
                "field": "gender"
              },
              "aggs": {
                "salary_stats": {
                  "stats": {
                    "field": "salary"
                  }
                }
              }
            }
          }
        }
      }
      }

      总结

  • 聚合分析的具体语法

    • 一个聚合查询中可以包含多个聚合:每个Bucket聚合可以包含多个子聚合
  • Metrix

    • 单值输出 & 多值输出
  • Bucket

    • Terms & 数字范围
本作品采用《CC 协议》,转载必须注明作者和本文链接
快乐就是解决一个又一个的问题!
CrazyZard
讨论数量: 0
(= ̄ω ̄=)··· 暂无内容!

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!