R语言绘图系列—箱线图+抖动散点
(二): 科研绘图一:箱线图(抖动散点)
文章目录
- R语言绘图系列---箱线图+抖动散点
- (二): 科研绘图一:箱线图(抖动散点)
- 前言
- 一、箱线图
- 注意:
- 二、R语言绘制
- 1.R包载入和构建模拟数据
- 2.使用ggplot函数绘制箱线图
- 3.保存
- 三、完整代码
前言
当每个组的数据点相对较少时,又需要显示出各组数据的分布情况时,可以采用箱线图叠加散点图的方式来表达数据。
一、箱线图
箱形图又称为盒须图、盒式图、盒状图或箱线图,是一种用作显示一组数据分散情况资料的统计图。因型状如箱子而得名。在各种领域也经常被使用,常见于品质管理,快速识别异常值。箱形图最大的优点就是不受异常值的影响,能够准确稳定地描绘出数据的离散分布情况,同时也利于数据的清洗。可以使用ggplot()绘制本示例的箱线图。
注意:
- 图层叠加的先后顺序会影响绘制的结果,本示例中从底层开始的图层顺序依次为:箱线图,散点图,箭头,箱线图处于最底层。
- 散点图需要使用抖动的方式绘制,以避免重叠,示例中横轴为分组变量,纵轴为实际关心的值,设置抖动时需避免纵轴的值被改变,所以geom_jitter函数中仅需设置width,不设置height。
- 如果数据中存在异常点时,箱线图也会绘制出异常点,而散点图又会绘制所有的点,使得每个异常点被绘制了两次,可以通过在绘制箱线图时将异常点的颜色设定为与背景色相同即可(使其在箱线图中不可见)。
二、R语言绘制
1.R包载入和构建模拟数据
# 载入R包:
library(ggplot2)
library(latex2exp)
# 构建模拟数据:
G1 <- runif(100, min = 0, max = 7)
G2 <- runif(20, min = 5, max = 7)
G3 <- runif(10, min = 1, max = 6)
G4 <- runif(15, min = 2, max = 6)
G5 <- runif(20, min = 2.2, max = 6.5)
G6 <- runif(10, min = 3.5, max = 5)
G7 <- runif(80, min = 1, max = 6)
G8 <- runif(70, min = 1, max = 5.5)
G9 <- runif(60, min = 1.5, max = 6)
G10 <- runif(200, min = 1, max = 7.2)
# 合并:
data <- data.frame(Group = rep(paste0("G", 1:10),
c(100, 20, 10, 15,
20, 10, 80, 70,
60, 200)),
values = c(G1,G2,G3,G4,G5,G6,G7,G8,G9,G10))
data$Group <- factor(data$Group, levels = paste0("G", 1:10))
# Group values
# 1 G1 2.20054387
# 2 G1 1.90207512
# 3 G1 2.74224843
# 4 G1 2.17059052
# 5 G1 4.14728737
# 6 G1 0.01258516
head(data)
模拟数据示例:
Group | values | |
---|---|---|
1 | G1 | 2.20054387 |
2 | G1 | 1.90207512 |
3 | G1 | 2.74224843 |
4 | G1 | 2.17059052 |
5 | G1 | 4.14728737 |
6 | G1 | 0.01258516 |
2.使用ggplot函数绘制箱线图
ggplot(data, aes(Group, values))+
# 箱线图:
geom_boxplot(outlier.shape = NA, width = 0.6)+
# 抖动散点:
geom_jitter(aes(color = Group), width = 0.15, size = 1)+
# 横线:
geom_hline(yintercept = 4, linetype = "dashed")+
# 箭头:
geom_segment(aes(x = 2, y = 7.5, xend = 2, yend = 7.2),
arrow = arrow(length = unit(1, "mm"))) +
scale_color_manual(name = "Subtype",
values = c("#fd6ab0", "#aa5700", "#f48326", "#ffd711",
"#9bd53f", "#00ae4c", "#00c1e3", "#007ddb",
"#8538d1", "#d01910"))
# 文字注释:
annotate("text", label = TeX("$\\textit{P} = 1.6e-06$"),
size = 3, x = 2, y = 8)
# 坐标轴标签:
xlab("")
ylab(TeX("$Log_{2}(FPKM+1)$"))
# 标题:
ggtitle(TeX("$\\textit{GATA3}$ gene expression in T-ALL"))
# 主题:
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
guides(color=guide_legend(override.aes = list(size=2),
title.theme = element_text(face = "bold")))
3.保存
ggsave("box_plot.pdf", height = 5, width = 7)
三、完整代码
# 载入R包:
library(ggplot2)
library(latex2exp)
# 构建模拟数据:
G1 <- runif(100, min = 0, max = 7)
G2 <- runif(20, min = 5, max = 7)
G3 <- runif(10, min = 1, max = 6)
G4 <- runif(15, min = 2, max = 6)
G5 <- runif(20, min = 2.2, max = 6.5)
G6 <- runif(10, min = 3.5, max = 5)
G7 <- runif(80, min = 1, max = 6)
G8 <- runif(70, min = 1, max = 5.5)
G9 <- runif(60, min = 1.5, max = 6)
G10 <- runif(200, min = 1, max = 7.2)
# 合并:
data <- data.frame(Group = rep(paste0("G", 1:10),
c(100, 20, 10, 15,
20, 10, 80, 70,
60, 200)),
values = c(G1,G2,G3,G4,G5,G6,G7,G8,G9,G10))
data$Group <- factor(data$Group, levels = paste0("G", 1:10))
head(data)
ggplot(data, aes(Group, values))+
# 箱线图:
geom_boxplot(outlier.shape = NA, width = 0.6)+
# 抖动散点:
geom_jitter(aes(color = Group), width = 0.15, size = 1)+
# 横线:
geom_hline(yintercept = 4, linetype = "dashed")+
# 箭头:
geom_segment(aes(x = 2, y = 7.5, xend = 2, yend = 7.2),
arrow = arrow(length = unit(1, "mm"))) +
scale_color_manual(name = "Subtype",
values = c("#fd6ab0", "#aa5700", "#f48326", "#ffd711",
"#9bd53f", "#00ae4c", "#00c1e3", "#007ddb",
"#8538d1", "#d01910"))
# 文字注释:
annotate("text", label = TeX("$\\textit{P} = 1.6e-06$"),
size = 3, x = 2, y = 8)
# 坐标轴标签:
xlab("")
ylab(TeX("$Log_{2}(FPKM+1)$"))
# 标题:
ggtitle(TeX("$\\textit{GATA3}$ gene expression in T-ALL"))
# 主题:
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
guides(color=guide_legend(override.aes = list(size=2),
title.theme = element_text(face = "bold")))
ggsave("box_plot.pdf", height = 5, width = 7)