笔记三十九:Pipeline 聚合分析

一个例子:Pipeline: min_bucket

  • 在员工数最多的工种里,找出平均工资最低的工种

ES 笔记三十九:Pipeline聚合分析

Pipeline

  • 管道的概念:支持对聚合分析的结果,再次进行聚合分析
  • Pipeline的分析结果会输出到原结果汇总,根据位置的不同,分为两类
    • Sibling - 结果和现有分析结果同级
      • Max,min,Avg&Sum Bucket
      • Stats , Extened Status Bucket
      • Percentiles Bucket
    • Parent - 结果内嵌到现有的聚合分析结果之中
      • Derivative(求导)
      • Cumultive Sum(累计求和)
      • Moving Function(滑动窗口)

Sibling Pipeline 的例子

  • 对不同类型工作的,平均工资
    • 求最大
    • 平均
    • 统计信息
    • 百分位数
      //插入数据
      DELETE employees
      PUT /employees/_bulk
      { "index" : {  "_id" : "1" } }
      { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
      { "index" : {  "_id" : "2" } }
      { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
      { "index" : {  "_id" : "3" } }
      { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
      { "index" : {  "_id" : "4" } }
      { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
      { "index" : {  "_id" : "5" } }
      { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
      { "index" : {  "_id" : "6" } }
      { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
      { "index" : {  "_id" : "7" } }
      { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
      { "index" : {  "_id" : "8" } }
      { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
      { "index" : {  "_id" : "9" } }
      { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
      { "index" : {  "_id" : "10" } }
      { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
      { "index" : {  "_id" : "11" } }
      { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
      { "index" : {  "_id" : "12" } }
      { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
      { "index" : {  "_id" : "13" } }
      { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
      { "index" : {  "_id" : "14" } }
      { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
      { "index" : {  "_id" : "15" } }
      { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
      { "index" : {  "_id" : "16" } }
      { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "17" } }
      { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "18" } }
      { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
      { "index" : {  "_id" : "19" } }
      { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
      { "index" : {  "_id" : "20" } }
      { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
      # 平均工资最低的工作类型
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword",
            "size": 10
          },
          "aggs": {
            "avg_salary": {
              "avg": {
                "field": "salary"
              }
            }
          }
        },
        "min_salary_by_job": {
          "min_bucket": {
            "buckets_path": "jobs>avg_salary"
          }
        }
      }
      }
      # 平均工资最高的工作类型
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword",
            "size": 10
          },
          "aggs": {
            "avg_salary": {
              "avg": {
                "field": "salary"
              }
            }
          }
        },
        "max_salary_by_job":{
          "max_bucket": {
            "buckets_path": "jobs>avg_salary"
          }
        }
      }
      }
      # 平均工资的平均工资
      POST employees/_search
      {
      "size": 0,
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword",
            "size": 10
          },
          "aggs": {
            "avg_salary": {
              "avg": {
                "field": "salary"
              }
            }
          }
        },
        "avg_salary_by_job":{
          "avg_bucket": {
            "buckets_path": "jobs>avg_salary"
          }
        }
      }
      }
      ES 笔记三十九:Pipeline 聚合分析

Parent Pipeline : Derivative

  • 按年龄、对工资进行求导(看工资发展的趋势)
    #按照年龄对平均工资求导 
    POST employees/_search
    {
    "size": 0,
    "aggs": {
      "age": {
        "histogram": {
          "field": "age",
          "min_doc_count": 1,
          "interval": 1
        },
        "aggs": {
          "avg_salary": {
            "avg": {
              "field": "salary"
            }
          },
          "derivative_avg_salary":{
            "derivative": {
              "buckets_path": "avg_salary"
            }
          }
        }
      }
    }
    }
    //return 
    "aggregations" : {
      "age" : {
        "buckets" : [
          {
            "key" : 20.0,
            "doc_count" : 1,
            "avg_salary" : {
              "value" : 9000.0
            }
          },
          {
            "key" : 21.0,
            "doc_count" : 1,
            "avg_salary" : {
              "value" : 16000.0
            },
            "derivative_avg_salary" : {
              "value" : 7000.0
            }
          }
      ]
    }

ES 笔记三十九:Pipeline聚合分析

Parent Pipeline

  • 年龄直方图划分的平均工资
    • Cumulative Sum
    • Moving Function
#Cumulative_sum
POST employees/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "histogram": {
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "cumulative_salary":{
          "cumulative_sum": {
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}

#Moving Function
POST employees/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "histogram": {
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "moving_avg_salary":{
          "moving_fn": {
            "buckets_path": "avg_salary",
            "window":10,
            "script": "MovingFunctions.min(values)"
          }
        }
      }
    }
  }
}
本作品采用《CC 协议》,转载必须注明作者和本文链接
快乐就是解决一个又一个的问题!
CrazyZard
讨论数量: 0
(= ̄ω ̄=)··· 暂无内容!

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!