数据和代码获取:请查看主页个人信息!!!
关键词“地平线图”
1. 数据读取与处理
首先,从TSV文件中读取数据,并进行数据清洗和处理。
rm(list=ls())
pacman::p_load(tidyverse,ggalt,ggHoriPlot,hrbrthemes)
sports <- read_tsv("activity.tsv")
2. 数据清洗
sports <- sports %>%
group_by(activity) %>%
filter(max(p) > 3e-04,
!grepl('n\\.e\\.c', activity)) %>%
arrange(time) %>%
mutate(p_peak = p / max(p),
p_smooth = (lag(p_peak) + p_peak + lead(p_peak)) / 3,
p_smooth = coalesce(p_smooth, p_peak)) %>%
ungroup() %>%
do({
rbind(.,
filter(., time == 0) %>%
mutate(time = 24*60))
}) %>%
mutate(time = ifelse(time < 3 * 60, time + 24 * 60, time)) %>%
mutate(activity = reorder(activity, p_peak, FUN=which.max)) %>%
arrange(activity) %>%
mutate(activity.f = reorder(as.character(activity), desc(activity)))
sports <- mutate(sports, time2 = time/60)
3. 绘制初步图表
根据处理后的数据生成初步图表,展示不同体育活动在一天中的分布情况。
ggplot(sports, aes(time2, p_smooth)) +
geom_horizon(bandwidth=0.1) +
facet_grid(activity.f~.) +
scale_x_continuous(expand=c(0,0), breaks=seq(from = 3, to = 27, by = 3), labels = function(x) {sprintf("%02d:00", as.integer(x %% 24))}) +
viridis::scale_fill_viridis(name = "Activity relative to peak", discrete=TRUE,
labels=scales::percent(seq(0, 1, 0.1)+0.1))
4. 美化图表
进一步美化图表,使其更具吸引力和可读性。
ggplot(sports, aes(time2, p_smooth)) +
geom_horizon(bandwidth=0.1) +
facet_grid(activity.f~.) +
scale_x_continuous(expand=c(0,0), breaks=seq(from = 3, to = 27, by = 3), labels = function(x) {sprintf("%02d:00", as.integer(x %% 24))}) +
viridis::scale_fill_viridis(name = "Activity relative to peak", discrete=TRUE,
labels=scales::percent(seq(0, 1, 0.1)+0.1)) +
theme_ipsum_rc(grid="") +
theme(panel.spacing.y=unit(-0.05, "lines"),
strip.text.y = element_text(hjust=0, angle=360),
axis.text.y=element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
ggsave('pic.png', bg = 'white', width = 8, height = 6)
5. 可视化结果
这张图表展示了不同体育活动在一天中的高峰时段。颜色深浅代表了活动强度的相对峰值。通过这张图表,我们可以清晰地看到各项活动在一天中不同时间段的分布情况。