一、每日一题
返回结果示例如下:
示例 1:
输入: Ads 表: +-------+---------+---------+ | ad_id | user_id | action | +-------+---------+---------+ | 1 | 1 | Clicked | | 2 | 2 | Clicked | | 3 | 3 | Viewed | | 5 | 5 | Ignored | | 1 | 7 | Ignored | | 2 | 7 | Viewed | | 3 | 5 | Clicked | | 1 | 4 | Viewed | | 2 | 11 | Viewed | | 1 | 2 | Clicked | +-------+---------+---------+ 输出: +-------+-------+ | ad_id | ctr | +-------+-------+ | 1 | 66.67 | | 3 | 50.00 | | 2 | 33.33 | | 5 | 0.00 | +-------+-------+ 解释: 对于 ad_id = 1, ctr = (2/(2+1)) * 100 = 66.67 对于 ad_id = 2, ctr = (1/(1+2)) * 100 = 33.33 对于 ad_id = 3, ctr = (1/(1+1)) * 100 = 50.00 对于 ad_id = 5, ctr = 0.00, 注意 ad_id = 5 没有被点击 (Clicked) 或查看 (Viewed) 过 注意我们不关心 action 为 Ingnored 的广告
解答:
import pandas as pd
def ads_performance(ads_df: pd.DataFrame) -> pd.DataFrame:
# 过滤掉 Ignored 行
filtered_df = ads_df[ads_df['action'] != 'Ignored']
# 计算每个 ad_id 的点击次数和查看次数
summary = filtered_df.groupby(['ad_id', 'action']).size().unstack(fill_value=0)
# 确保有 Clicked 和 Viewed 列,如果不存在则创建并填充为0
summary['Clicked'] = summary.get('Clicked', 0)
summary['Viewed'] = summary.get('Viewed', 0)
# 计算CTR
summary['CTR'] = (summary['Clicked'] / (summary['Clicked'] + summary['Viewed'])).fillna(0) * 100
# 处理没有被点击或查看的广告,赋值为0
full_ad_ids = ads_df['ad_id'].unique()
ctr_df = summary.reindex(full_ad_ids, fill_value = 0).reset_index()
ctr_df['CTR'] = ctr_df['CTR'].fillna(0).round(2)
# 排序,按CTR降序和ad_id升序排序
ctr_df = ctr_df.sort_values(by = ['CTR', 'ad_id'], ascending = [False, True])[['ad_id', 'CTR']]
return ctr_df
# 测试代码
data = {
'ad_id': [1, 2, 3, 5, 1, 2, 3, 1, 2, 1],
'user_id': [1, 2, 3, 5, 7, 7, 5, 4, 11, 2],
'action': ['Clicked', 'Clicked', 'Viewed', 'Ignored', 'Ignored', 'Viewed', 'Clicked', 'Viewed', 'Viewed', 'Clicked']
}
ads_df = pd.DataFrame(data)
result = ads_performance(ads_df)
print(result)
二、总结
注意分母为0的情况时如何处理。