笔记三十九:Pipeline 聚合分析
一个例子:Pipeline: min_bucket
- 在员工数最多的工种里,找出平均工资最低的工种
Pipeline
- 管道的概念:支持对聚合分析的结果,再次进行聚合分析
- Pipeline的分析结果会输出到原结果汇总,根据位置的不同,分为两类
- Sibling - 结果和现有分析结果同级
- Max,min,Avg&Sum Bucket
- Stats , Extened Status Bucket
- Percentiles Bucket
- Parent - 结果内嵌到现有的聚合分析结果之中
- Derivative(求导)
- Cumultive Sum(累计求和)
- Moving Function(滑动窗口)
- Sibling - 结果和现有分析结果同级
Sibling Pipeline 的例子
- 对不同类型工作的,平均工资
- 求最大
- 平均
- 统计信息
- 百分位数
//插入数据 DELETE employees PUT /employees/_bulk { "index" : { "_id" : "1" } } { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 } { "index" : { "_id" : "2" } } { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000} { "index" : { "_id" : "3" } } { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 } { "index" : { "_id" : "4" } } { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000} { "index" : { "_id" : "5" } } { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 } { "index" : { "_id" : "6" } } { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000} { "index" : { "_id" : "7" } } { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 } { "index" : { "_id" : "8" } } { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000} { "index" : { "_id" : "9" } } { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 } { "index" : { "_id" : "10" } } { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000} { "index" : { "_id" : "11" } } { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 } { "index" : { "_id" : "12" } } { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000} { "index" : { "_id" : "13" } } { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 } { "index" : { "_id" : "14" } } { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000} { "index" : { "_id" : "15" } } { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 } { "index" : { "_id" : "16" } } { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "17" } } { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "18" } } { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000} { "index" : { "_id" : "19" } } { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000} { "index" : { "_id" : "20" } } { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
# 平均工资最低的工作类型 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "min_salary_by_job": { "min_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资最高的工作类型 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "max_salary_by_job":{ "max_bucket": { "buckets_path": "jobs>avg_salary" } } } } # 平均工资的平均工资 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "avg_salary_by_job":{ "avg_bucket": { "buckets_path": "jobs>avg_salary" } } } }
Parent Pipeline : Derivative
- 按年龄、对工资进行求导(看工资发展的趋势)
#按照年龄对平均工资求导 POST employees/_search { "size": 0, "aggs": { "age": { "histogram": { "field": "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "derivative_avg_salary":{ "derivative": { "buckets_path": "avg_salary" } } } } } } //return "aggregations" : { "age" : { "buckets" : [ { "key" : 20.0, "doc_count" : 1, "avg_salary" : { "value" : 9000.0 } }, { "key" : 21.0, "doc_count" : 1, "avg_salary" : { "value" : 16000.0 }, "derivative_avg_salary" : { "value" : 7000.0 } } ] }
Parent Pipeline
- 年龄直方图划分的平均工资
- Cumulative Sum
- Moving Function
#Cumulative_sum
POST employees/_search
{
"size": 0,
"aggs": {
"age": {
"histogram": {
"field": "age",
"min_doc_count": 1,
"interval": 1
},
"aggs": {
"avg_salary": {
"avg": {
"field": "salary"
}
},
"cumulative_salary":{
"cumulative_sum": {
"buckets_path": "avg_salary"
}
}
}
}
}
}
#Moving Function
POST employees/_search
{
"size": 0,
"aggs": {
"age": {
"histogram": {
"field": "age",
"min_doc_count": 1,
"interval": 1
},
"aggs": {
"avg_salary": {
"avg": {
"field": "salary"
}
},
"moving_avg_salary":{
"moving_fn": {
"buckets_path": "avg_salary",
"window":10,
"script": "MovingFunctions.min(values)"
}
}
}
}
}
}
本作品采用《CC 协议》,转载必须注明作者和本文链接