注:参考文章:
SQL 不及格课程数大于2的学生的平均成绩及其排名-HQL面试题47【拼多多】_sql 不及格人数超过两人-CSDN博客文章浏览阅读976次。0 问题描述create table scores( sid int, score int, cid int);insert into scores values(1, 90, 1),(1, 59, 2),(1, 67, 3),(2, 20, 1),(2, 30, 2),(2, 40, 3),(3, 14, 1),(3, 13, 2),(3, 15, 3),(4, 90, 1),(4, 90, 2),(4, 87, 3);1 数据分析..._sql 不及格人数超过两人https://blog.csdn.net/godlovedaniel/article/details/119858725
0 问题描述
求不及格课程数大于2的学生的平均成绩及其成绩平均值后所在的排名。(成绩小于60分的判定为不及格)
1 数据准备
create table scores
(
sid int,
score int,
cid int
)row format delimited
fields terminated by '\t';
insert into scores values
(1, 90, 1),
(1, 59, 2),
(1, 67, 3),
(2, 20, 1),
(2, 30, 2),
(2, 40, 3),
(3, 14, 1),
(3, 13, 2),
(3, 15, 3),
(4, 90, 1),
(4, 90, 2),
(4, 87, 3);
2 数据分析
完整的代码如下:
select
t3.sid,
t3.avg_score,
t3.dr
from (select distinct
sid,
avg_score,
dense_rank() over (order by avg_score desc) dr
from (select
sid,
score,
avg(score) over (partition by sid) as avg_score
from scores) t2) t3
join (select
sid
from scores
group by sid
having sum(if(score < 60, 1, 0)) >= 2) t1
on t3.sid = t1.sid;
代码解析:
step1 :数据打标,成绩小于60的标1; 筛选出不及格课程数大于2的学生信息
select
sid
from (select *,
if(score < 60, 1, 0) as flag
from scores) t1
group by sid
having sum(flag) >= 2
简化版本为:
select sid
from scores
group by sid
having sum(if(score <60,1,0)) >=2;
step2 :求学生平均成绩及其排名
select distinct
sid,
avg_score,
dense_rank() over (order by avg_score desc) dr
from (select
sid,
score,
avg(score) over (partition by sid) as avg_score
from scores) t2;
ps: 这里针对avg_score 平均值排名,因为同一个sid, avg_score有重复值,所以排名需要只能用dense_rank,最后再用distinct 进行去重。
step3: 基于step2的结果,与step1的结果进行关联,过滤出最终的结果。最终SQL如下:
select
t3.sid,
t3.avg_score,
t3.dr
from (select distinct
sid,
avg_score,
dense_rank() over (order by avg_score desc) dr
from (select
sid,
score,
avg(score) over (partition by sid) as avg_score
from scores) t2) t3
join (select
sid
from scores
group by sid
having sum(if(score < 60, 1, 0)) >= 2) t1
on t3.sid = t1.sid;
3 小结
本案例主要涉及到开窗函数及多表关联的使用。需要注意hive中不支持in查询,因此借助join等关联手段代替。