数据准备
Create table If Not Exists Scores (id int, score DECIMAL(3,2))
Truncate table Scores
insert into Scores (id, score) values ('1', '3')
insert into Scores (id, score) values ('2', '3')
insert into Scores (id, score) values ('3', '4.0')
insert into Scores (id, score) values ('4', '4.0')
insert into Scores (id, score) values ('5', '5')
insert into Scores (id, score) values ('6', '6')
按照分数进行排名
MySQL 8.0 之后的版本可以使用开窗函数:
dense_rank():不间断排名
SELECT
score,
DENSE_RANK() OVER (
ORDER BY
score DESC
) AS 'rank'
FROM
Scores;
MySQL8.0 之前的版本,可以使用 count(distinct column)的方式实现:
SELECT
S1.score,
(
SELECT
COUNT(DISTINCT S2.score)
FROM
Scores S2
WHERE
S2.score >= S1.score
) AS 'rank'
FROM
Scores S1
ORDER BY
S1.score DESC;
检验是否有重复数据
我们可以使用duplicated
来验证指定列是否有重复数据。
源码:
def duplicated(
self,
subset: Hashable | Sequence[Hashable] | None = None,
keep: Literal["first", "last", False] = "first",
) -> Series:
subset:列标识或者列的序列,可选。默认值是None,即使用所有的列进行检测,如果 subset 指定了部分列,则只考虑 subset 指定列是否重复。
keep:默认是 ‘first’。这个域决定了如何去标识重复行。
'first':重复的行中,除了第一行,其余的都标识为 True
'last':重复的行中,除了最后一行,其余的都标识为 True
False:把所有重复的行中都标识为 True
import pandas as pd
data = [[1, 'a@b.com'], [2, 'c@d.com'], [3, 'a@b.com']]
person = pd.DataFrame(data, columns=['id', 'email']).astype({'id':'Int64', 'email':'object'})
print(f'原始数据:{person}')
print(f'筛选结果:{person.email.duplicated()}')