注:参考文章:
SQL之用户中两人一定认识的组合数--HQL面试题36【快手数仓面试题】_sql面试题-快手-CSDN博客文章浏览阅读1.2k次,点赞3次,收藏12次。目录0 需求分析1 数据准备2 数据分析3 小结0 需求分析设表名:table0现有城市网吧访问数据,字段:网吧id,访客id(身份证号),上线时间,下线时间规则1、如果有两个用户在一家网吧的前后上下线时间在10分钟以内,则两人可能认识规则2、如果这两个用户在三家以上网..._sql面试题-快手https://blog.csdn.net/godlovedaniel/article/details/119155757
0 问题描述
现有一张表table21, 里面装载城市网吧访问数据,字段:网吧id, 访客id(身份证号),上线时间、下线时间
规则1:如果有两个用户在一家网吧的前后上线时间在10分钟内,则两人可能认识;
规则2:如果这两个用户在三家以上网吧出现【规则1】的情况,则两人一定认识
需求:该城市上网用户中两人一定认识的组合数
1 数据准备
create table table21(
wid string,
uid string,
ontime string,
offtime string
)
row format delimited fields terminated by '\t';
insert overwrite table table21 values
(1,110001,'2020-01-01 11:10:00','2020-01-01 11:15:00')
,(1,110001,'2020-01-01 11:18:00','2020-01-01 11:23:00')
,(1,110002,'2020-01-01 12:10:00','2020-01-01 13:15:00')
,(1,110001,'2020-01-01 12:11:00','2020-01-01 13:10:00')
,(1,110003,'2020-01-01 12:15:00','2020-01-01 13:15:00')
,(1,110004,'2020-01-01 12:16:00','2020-01-01 13:18:00')
,(2,110001,'2020-01-02 12:10:00','2020-01-02 12:30:00')
,(2,110001,'2020-01-02 12:50:00','2020-01-02 13:05:00')
,(2,110002,'2020-01-02 12:52:00','2020-01-02 12:55:00')
,(2,110003,'2020-01-02 12:58:00','2020-01-02 13:20:00')
,(2,110004,'2020-01-02 13:00:00','2020-01-02 13:10:00')
,(3,110001,'2020-01-03 12:10:00','2020-01-03 12:30:00')
,(3,110003,'2020-01-03 12:55:00','2020-01-03 13:02:00')
,(3,110001,'2020-01-03 12:50:00','2020-01-03 12:55:00')
,(3,110002,'2020-01-03 13:00:00','2020-01-03 13:01:00')
,(3,110004,'2020-01-03 12:58:00','2020-01-03 13:03:00')
,(3,110002,'2020-01-03 13:20:00','2020-01-03 13:25:00');
2 数据分析
根据规则1和规则2,求城市上网用户中两人一定认识的组合数,就是指两两相识的组合数。对于这种两两组合数一般用自关联,通过自关联将尽可能的情况表示出来,然后按照条件筛选数据
step1:表自关联计算,得到所有相遇的情况:(笛卡尔积)
select *
from table21 as t0
join table21 as t1;
step2:根据规则1,得出可能的结果:
select
t0.wid as t0_wid,
t0.uid as t0_uid,
t1.wid as t1_wid,
t1.uid as t1_uid
from table21 as t0
join table21 as t1
where t0.wid = t1.wid
and (abs(unix_timestamp(t0.ontime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.ontime, 'yyyy-MM-dd HH:mm:ss')) < 600 or
abs(unix_timestamp(t0.offtime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.offtime, 'yyyy-MM-dd HH:mm:ss')) < 600)
and t0.uid > t1.uid
上述代码用到的函数:
unix_timestamp(日期转时间戳函数)
语法:unix_timestamp(string date) 、unix_timestamp(string date,string pattern)
返回值:bigint
说明:将格式为"yyyy-MM-dd HH:mm:ss"的日期 转换成 unix的时间戳。如果转换失败,则返回值为0;
举例:select unix_timestamp('20240201 20:17:11','yyyyMMdd HH:mm:ss') --> 1706825843
abs(unix_timestamp(t0.ontime, 'yyyy-MM-dd HH:mm:ss') - unix_timestamp(t1.ontime, 'yyyy-MM-dd HH:mm:ss')) < 600 代表的意思是:两个用户在一家网吧的前后上线时间在10分钟内(10分钟也就是600秒)
ps: 需要将同一网吧中可能两两相识的人筛选出来,所以【用户A、用户B】 与【用户B、用户A】 实际上是一样的,只需要选出 t0.uid > t1.uid 即可(去重取一)
step3:根据step2,可以将同一网吧中可能两两相识的人筛选出来,将互相认识的人组合成一个key,通过该key来判断该两人是否满足规则2。具体sql如下:
select
t0_wid,
-- 将可能互相认识的人的uid拼接起来,组成key值(uuid)
concat_ws('~', t0_uid, t1_uid) as uuid
from (
select
t0.wid as t0_wid,
t0.uid as t0_uid,
t1.wid as t1_wid,
t1.uid as t1_uid
from table21 as t0
join table21 as t1
where t0.wid = t1.wid
and (abs(unix_timestamp(t0.ontime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.ontime, 'yyyy-MM-dd HH:mm:ss')) < 600 or
abs(unix_timestamp(t0.offtime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.offtime, 'yyyy-MM-dd HH:mm:ss')) < 600)
and t0.uid > t1.uid
) t2
step4:对【两人一定认识】记录进行打标签,记为 1
select
uuid,
-- 对【两人一定认识】记录进行打标签,记为 1
if(count(t0_wid) >=3,1,0) as flag
from
(
select
t0_wid,
-- 将可能互相认识的人的uid拼接起来,组成key值(uuid)
concat_ws('~', t0_uid, t1_uid) as uuid
from (
select
t0.wid as t0_wid,
t0.uid as t0_uid,
t1.wid as t1_wid,
t1.uid as t1_uid
from table21 as t0
join table21 as t1
where t0.wid = t1.wid
and (abs(unix_timestamp(t0.ontime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.ontime, 'yyyy-MM-dd HH:mm:ss')) < 600 or
abs(unix_timestamp(t0.offtime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.offtime, 'yyyy-MM-dd HH:mm:ss')) < 600)
and t0.uid > t1.uid
) t2
)t3
group by uuid;
step4:计算满足规则1和规则2的记录总数,得出结果为6条
select
count(1) as cnt
from (
select
uuid,
-- 对【两人一定认识】记录进行打标签,记为 1
if(count(t0_wid) >= 3, 1, 0) as flag
from (
select
t0_wid,
-- 将可能互相认识的人的uid拼接起来,组成key值(uuid)
concat_ws('~', t0_uid, t1_uid) as uuid
from (
select
t0.wid as t0_wid,
t0.uid as t0_uid,
t1.wid as t1_wid,
t1.uid as t1_uid
from table21 as t0
join table21 as t1
where t0.wid = t1.wid
and (abs(unix_timestamp(t0.ontime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.ontime, 'yyyy-MM-dd HH:mm:ss')) < 600 or
abs(unix_timestamp(t0.offtime, 'yyyy-MM-dd HH:mm:ss')
- unix_timestamp(t1.offtime, 'yyyy-MM-dd HH:mm:ss')) < 600)
and t0.uid > t1.uid
) t2
) t3
group by uuid
) t4;
3 小结
本案例题型属于:“共同xx”,例如:共同好友、互相认识、共同使用等。遇到这类关键字的时候,往往可以采用自关联的方式解决。(笛卡尔积:“一对多”或者“ 多对一”),一般的解题步骤就是:通过自关联将所有的组合求解出来,然后将符合条件的数据进行过滤即可。