统计用户留存
统计用户留存
最初用msyql统计,数据稍微多点就很慢。所以把数据同步到clickhouse再统计,会快很多。
mysql 查询统计
# mysql统计
SELECT
*,
CONCAT(ROUND(100*次日留存用户/日新增用户数,2),'%') '次日留存率',
CONCAT(ROUND(100*三日留存用户/日新增用户数,2),'%') '三日留存率',
CONCAT(ROUND(100*四日留存用户/日新增用户数,2),'%') '四日留存率',
CONCAT(ROUND(100*五日留存用户/日新增用户数,2),'%') '五日留存率',
CONCAT(ROUND(100*六日留存用户/日新增用户数,2),'%') '六日留存率',
CONCAT(ROUND(100*七日留存用户/日新增用户数,2),'%') '七日留存率',
CONCAT(ROUND(100*月留存用户/日新增用户数,2),'%') '月留存率'
FROM
(
SELECT
t.`dt` '日期',
t.cid '渠道',
COUNT(DISTINCT(t.id)) '日新增用户数',
COUNT(DISTINCT(t1.uid)) '次日留存用户',
COUNT(DISTINCT(t2.uid)) '三日留存用户',
COUNT(DISTINCT(t3.uid)) '四日留存用户',
COUNT(DISTINCT(t4.uid)) '五日留存用户',
COUNT(DISTINCT(t5.uid)) '六日留存用户',
COUNT(DISTINCT(t6.uid)) '七日留存用户',
COUNT(DISTINCT(t7.uid)) '月留存用户'
FROM
(
SELECT min(date(create_time)) as dt, uid, cid FROM login_record group by date(create_time),uid,cid
) t
LEFT JOIN login_record t1 ON t.id = t1.uid AND DATEDIFF(date(t1.create_time),t.`dt`) = 1
LEFT JOIN login_record t2 ON t.id = t2.uid AND DATEDIFF(date(t2.create_time),t.`dt`) = 2
LEFT JOIN login_record t3 ON t.id = t3.uid AND DATEDIFF(date(t3.create_time),t.`dt`) = 3
LEFT JOIN login_record t4 ON t.id = t4.uid AND DATEDIFF(date(t4.create_time),t.`dt`) = 4
LEFT JOIN login_record t5 ON t.id = t5.uid AND DATEDIFF(date(t5.create_time),t.`dt`) = 5
LEFT JOIN login_record t6 ON t.id = t6.uid AND DATEDIFF(date(t6.create_time),t.`dt`) = 6
LEFT JOIN login_record t7 ON t.id = t7.uid AND DATEDIFF(date(t7.create_time),t.`dt`) = 30
GROUP BY t.dt, t.cid
) p WHERE 日期 BETWEEN '2022-07-1' AND '2022-07-25' AND 渠道 = 100;
clickhouse 查询统计
统计活跃用户留存和留存率
只查询login_record表, 先统计用户最开始登录的时间(基础日期)
左连接自己统计登录时间(相对日期)
连接条件是用户id相等
并且基础日期小于等于相对日期
相对日期-基础日期=天数
留存数:
判断天数=0, 则是当天登录数. =1, 则是次登数
次登留存率 round(次登数/当天登录数, 保留位数)
# clickhouse统计
select
base_date,
cid,
day_0,
day_1,
day_2,
day_3,
day_4,
day_5,
day_6,
day_29,
-- 保留4位小数,用于百分比展示
round(day_1/day_0, 4) retention_2,
round(day_2/day_0, 4) retention_3,
round(day_3/day_0, 4) retention_4,
round(day_4/day_0, 4) retention_5,
round(day_5/day_0, 4) retention_6,
round(day_6/day_0, 4) retention_7,
round(day_29/day_0, 4) retention_30
from
(
select
base_date,
cid,
-- 留存数,day_0: 当天登录数
sum(case when datediff = 0 then 1 else 0 end) day_0,
sum(case when datediff = 1 then 1 else 0 end) day_1,
sum(case when datediff = 2 then 1 else 0 end) day_2,
sum(case when datediff = 3 then 1 else 0 end) day_3,
sum(case when datediff = 4 then 1 else 0 end) day_4,
sum(case when datediff = 5 then 1 else 0 end) day_5,
sum(case when datediff = 6 then 1 else 0 end) day_6,
sum(case when datediff = 29 then 1 else 0 end) day_29
from
(
select
uid,
cid,
base_date,
retention_date,
-- 后续登录时间 - 首日登录时间 = 天数
retention_date - base_date datediff
from
(
select
uid,
cid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(
select
distinct
date(create_time) base_date,
uid
from
login_record
where
-- 需要计算留存的日期区间
base_date between '2022-07-01' and '2022-07-25'
group by
base_date, uid
) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(
select
distinct
date(create_time) retention_date,
uid,
cid
from
login_record
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date >= '2022-07-01'
group by
retention_date, uid, cid
) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
base_date
) c
order by base_date
) d
group by base_date, cid
order by base_date
)
统计新增用户留存和留存率
与统计活跃用户不同,这里使用
user
表记录用户创建时间,与login_record
表连接后计算时间差
# clickhouse 查询
select
base_date,
cid,
day_0,
day_1,
day_2,
day_3,
day_4,
day_5,
day_6,
day_29,
-- 保留4位小数,用于百分比展示
round(day_1/day_0, 4) retention_2,
round(day_2/day_0, 4) retention_3,
round(day_3/day_0, 4) retention_4,
round(day_4/day_0, 4) retention_5,
round(day_5/day_0, 4) retention_6,
round(day_6/day_0, 4) retention_7,
round(day_29/day_0, 4) retention_30
from
(
select
base_date,
cid,
-- 留存数,day_0: 当天登录数
sum(case when datediff = 0 then 1 else 0 end) day_0,
sum(case when datediff = 1 then 1 else 0 end) day_1,
sum(case when datediff = 2 then 1 else 0 end) day_2,
sum(case when datediff = 3 then 1 else 0 end) day_3,
sum(case when datediff = 4 then 1 else 0 end) day_4,
sum(case when datediff = 5 then 1 else 0 end) day_5,
sum(case when datediff = 6 then 1 else 0 end) day_6,
sum(case when datediff = 29 then 1 else 0 end) day_29
from
(
select
uid,
cid,
base_date,
retention_date,
-- 后续登录时间 - 首日登录时间 = 天数
retention_date - base_date datediff
from
(
select
uid,
cid,
base_date,
retention_date
from
-- 表a,首日登录的用户
(
select
DISTINCT date(create_time) base_date,
cid,
id as uid
from
user
where
-- 需要计算留存的日期区间
base_date between '2022-07-01' and '2022-07-25'
) a
left join
-- 表b,表连接匹配后续登录的情况,计算留存
(
select
DISTINCT date(create_time) retention_date,
uid,
cid
from
login_record
where
-- 此处不应限制最大日期,否则可能导致后续留存率为0
retention_date > '2022-07-01'
group by
retention_date, uid, cid
) b
on
a.uid = b.uid
where
-- 保证每条记录不会连接到在当前记录时间之前的数据,如昨天的登录
base_date <= retention_date
order by
base_date
) c
order by base_date
) d
group by base_date, cid
order by base_date
)
结果如下:
本作品采用《CC 协议》,转载必须注明作者和本文链接