文章目录
- distinct 关键字
- 聚合函数
- 常见的聚合函数
- group by和having 分组过滤
- 窗口函数
- with as
- 窗口聚合函数
- 排名窗口函数
- 值窗口函数
distinct 关键字
distinct 去重数据,ps:null值也会查出来
select distinct column from table;
聚合函数
常见的聚合函数
select count(1) from table;
注意 count(*) 包括null值 count(expression) 会忽略null值
select max(score) from table; 忽略null值
select min(score) from table;忽略null值
select sum(score) from table;忽略null值
select avg(score) from table;忽略null值
select count(distinct score) from table;忽略null值
group by和having 分组过滤
select count(score) from table group by score having count(score) >2;
having 指的是对分组后的表进行过滤
窗口函数
MySQL8.0后支持的新特性,针对数据分析需要,又称OLAP函数,方便大数据分析
每行数据都会得到一个结果,会保留原有的数据列
使用场景:数据报表,大数据分析
窗口里的几个概念:
单个窗口里当前行
select count(version) from product group by version;
--
select id,sn,version,date,picture,count(version) over(partition by version) as count_version from product order by id ;
窗口函数语法
-- window_function over(partition by column
-- order by column
-- window_expression ) 语法
只影响单个分区里的数据 分区字段,区内排序,窗口大小
window_function 可以是三种函数
aggregate functions 聚合函数
sort functions 排序函数
analytics functions 统计和比较函数
重点是在于window_expression 就是这个窗口子句
unbounded preceding 分区第一行
m preceding 当前行的前m行
current row 当前行
n following 当前行的后n行
unbounded following 分区最后一行
如果要指定 window_expression
则语法为 function over (partition by column order by column
rows between A and B)
如果没有order by 则 默认 rows between unbounded preceding and unbounded following
如果有order by 则默认 rows between unbounded preceding and current row
CREATE TABLE `product` (
`id` int NOT NULL AUTO_INCREMENT,
`sn` varchar(255) DEFAULT NULL,
`version` varchar(255) DEFAULT NULL,
`date` datetime DEFAULT CURRENT_TIMESTAMP,
`picture` blob,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (1, '1', '2', '2024-05-12 00:07:59', NULL);
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (2, '2', '_4', '2024-05-11 00:08:23', NULL);
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (3, '1A', '1.02', '2024-01-05 12:12:32', NULL);
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (4, '1A', '1.02', '2024-01-05 12:12:32', NULL);
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (5, '1A', '1.02', '2024-01-05 12:12:32', NULL);
INSERT INTO `demo`.`product`(`id`, `sn`, `version`, `date`, `picture`) VALUES (6, '1A', '1.02', '2024-01-05 12:12:32', NULL);
select id,sn,version,date,picture,count(version) over(partition by version) as count_version from product order by id ;
select id,sn,version,date,picture,count(version) over(partition by version order by id ) as count_version from product order by id ;
with as
WITH xm_gl AS ( SELECT * FROM products WHERE pname IN ( '小米电视机', '格力空调' ) )
SELECT avg( price ) FROM xm_gl;
-- with as 创建多个临时表
WITH a AS ( SELECT * FROM category WHERE cname = '家电' ),
b AS ( SELECT * FROM products WHERE pname IN ( '小米电视机', '格力空调' ) )
SELECT * FROM a LEFT JOIN b ON a.cid = b.category_id;
窗口聚合函数
count(*),count(expression)
avg()
max()
min()
sum()
和常规聚合函数一模一样
select name,count(sales) over(partition by product order by name ) from table;
排名窗口函数
select *,
ROW_NUMBER() over(partition by dealer_id order by sales desc) rk01,
RANK() over(partition by dealer_id order by sales desc) rk02,
DENSE_RANK() over(partition by dealer_id order by sales desc) rk03,
PERCENT_RANK() over(partition by dealer_id order by sales desc) rk04
from q1_sales;
-- row_number() 根据分组和排序 生成一个初始值为1的唯一连续序列数
-- rank() 对组中的数据进行排名,如果名次相同,则排名也相同,但是下一个名次的排名序号会出现不连续。
-- dense_rank() dense_rank函数的功能与rank函数类似,dense_rank函数在生成序号时是连续的,而rank函数生成的序号有可能不连续。当出现名次相同时,则排名序号也相同。而下一个排名的序号与上一个排名序号是连续的。
-- percent_rank() 计算给定行的百分比排名。可以用来计算超过了百分之多少的人;排名计算公式为:(当前行的rank值-1)/(分组内的总行数-1)
值窗口函数
lag()函数 用于统计窗口内往上第n行值。第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL.
lead()函数 统计窗口内往下第n行值。第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL.
first_value()函数 取分组内排序后,截止到当前行,第一个值
last_value()函数 取分组内排序后,截止到当前行,最后一个值
select emp_name, dealer_id, sales, first_value(sales) over (partition by dealer_id order by sales) as dealer_low from q1_sales;
SELECT month
,shop
,MONEY
,LAG(MONEY, 1, 1) OVER ( --取分组内上一行的营业额,如果没有上一行则取1
PARTITION BY shop ORDER BY month --按商铺分组,按月份排序
) AS before_money
参考文章:【必看】最全开窗函数讲解和实战指南_窗口函数-CSDN博客