R语言ggplot2 | 绘制随机森林重要性+相关性热图

📋文章目录

  • 原图
  • 复现
    • 准备数据集及数据处理
    • 构建不同分类随机森林模型的并行计算
    • 绘制随机森林变量重要性柱状图
    • 计算数据集的相关性
    • 热图可视化
    • 合并随机森林重要性和热图
  • 附上所有代码

   在文献中,我们经常遇到随机森林和相关性热图的组合图片(下图),它由一幅叠加变量重要性圆圈的相关性热图和一幅说明因变量被解释程度的条形图组成。今天,我们将试着用自己的数据在R里面去复现这类图。

原图

   先看看所需复现的随机森林变量重要性+相关性热图:

在这里插入图片描述
这类图片由两部分组成。其中容易理解的就是下面的相关性热图,它是核心微生物(y轴)与土壤养分变量(x轴)的相关性分析。剩下的圆圈和上面的条形图都可以归类到随机森林分析部分。它将核心微生物作为自变量,分别将各个土壤养分变量作为因变量,进行随机森林分析。每一次随机森林分析得到的变量重要性指数则映射为热图里的圆圈大小(只收集p<0.05的变量的重要性),而模型对目标土壤养分变量的解释程度则作为上面条形图柱子的高度。接下来,我们来模仿复现这张图。

复现

准备数据集及数据处理

rm(list = ls())
# 在R中绘制随机森林模型中变量重要性和相关性的热图
library(vegan)
library(randomForest)
library(rfUtilities)
library(rfPermute)
library(ggplot2)
library(tidyverse)
library(patchwork)
library(export)
library(reshape2)

Micro <- openxlsx::read.xlsx("OTU1.xlsx", 1)
ENV <- openxlsx::read.xlsx("ENV1.xlsx", 1);head(ENV)
# Sample Ecosystem       SOC       pH       CEC        TP       TK       AK        TN        NO3        NH4      DOC
# 1     A1 Grassland  27.05671 7.904906 15.323426 0.9808826 14.75433 180.5585 2.1198671 125.335557  0.3171696 118.5123
# 2     A2 Grassland  25.76547 8.573666 18.104304 0.8527650 14.69093 166.8692 2.4470941 107.296478  2.4062394 124.6120
# 3     A3    Forest  12.33270 8.277955  8.297831 0.4298910 12.94322 121.5713 0.7090047   6.961546  2.1797286 129.7396
# 4     A4    Forest  16.63684 7.150481 15.632651 0.3711634 13.53909 194.0584 1.0868895  23.967231 29.2321159 129.3390
# 5     A5  Cropland 131.72950 7.548969 30.291618 0.6691908 14.19519 144.4154 5.4975148  38.032274 17.6463368 106.5602
# 6     A6  Cropland 123.75436 7.840254 34.357811 0.5828902 14.51699 387.8284 7.9689639  59.413622 12.7478292 105.8368
# MAT       SD       Pl
# 1 2.0162632 1.855115 274.0981
# 2 2.1286678 2.370189 369.1070
# 3 3.5848455 2.037255 263.5001
# 4 2.6846039 2.026378 346.7367
# 5 0.5787488 2.380261 333.7921
# 6 0.6756787 1.590068 240.1557
# 计算香浓指数,并把它合并在环境变量数据中
ENV$shannon <- diversity(t(Micro), index = "shannon")

# 生成一个总的与先前的不同生态系统合并成一个新的数据集
ENV_total <- ENV
ENV_total$Ecosystem <- "Whole"

ENV_new <- rbind(ENV_total, ENV)

# 确定不同生态系统类型的数量
unique(ENV_new$Ecosystem)
# [1] "Whole"     "Grassland" "Forest"    "Cropland"  "Wetland"   "Desert" 

自变量数据是一组虚拟的土壤理化指标数据,它来自5个生态系统:草地、森林、农田、湿地、荒漠,和一个总体的。随后将采样点微生物的香浓指数合并到环境变量中。我们计划将数据按生态系统包括总体共6类,计算每一类生态系统中土壤环境变量对微生物香浓指数的潜在贡献。这里的思路是通过并行运算每类的随机森林模型,提高运行速度。

构建不同分类随机森林模型的并行计算

# 构建自定义函数进行并行运算
library(foreach)
library(doParallel)

# 设置核心数
n_cores <- detectCores() - 1
cl <- makeCluster(n_cores)
registerDoParallel(cl)

# 获取生态系统列表
ecosystems <- unique(ENV_new$Ecosystem)

# 设置输出结果名称
options <- list()
options$results <- ecosystems

# 定义一个函数,对每个生态系统运行随机森林分析
run_single_rf <- function(data, eco) {
  if (!"Ecosystem" %in% names(data)) {
    stop("Error: 'Ecosystem' variable not found in data.")
  }
  if (!(eco %in% unique(data$Ecosystem))) {
    stop("Error: Invalid Ecosystem value.")
  }
  # 对当前生态系统的数据进行子集
  eco_data <- subset(data, Ecosystem == eco)[,-1:-2]
  # 设置种子
  set.seed(1)
  # 进行随机森林分析和置换检验
  RF <- randomForest(shannon~., eco_data, importance = T)
  RF_r2 <- rf.significance(RF, eco_data, nperm=99, ntree=501)$Rsquare
  RF_permuted <- rfPermute(shannon~., eco_data, nperm=99, ntree=501)
  # 返回结果
  list(ecosystem = eco, RF = RF, RF_r2 = RF_r2, RF_permuted = RF_permuted)
}

results <- foreach(i = ecosystems, .combine = rbind) %dopar% {
  library(randomForest)
  library(rfUtilities)
  library(rfPermute)
  # 运行单个随机森林分析
  run_single_rf(data = ENV_new, eco = i)
}

# 停止并行处理
stopCluster(cl)
registerDoSEQ()

# 确认结果
head(results)
# ecosystem   RF                      RF_r2     RF_permuted
# result.1 "Whole"     randomForest.formula,18 0.5722844 rfPermute,6
# result.2 "Grassland" randomForest.formula,18 0.6213995 rfPermute,6
# result.3 "Forest"    randomForest.formula,18 0.4347413 rfPermute,6
# result.4 "Cropland"  randomForest.formula,18 0.4695682 rfPermute,6
# result.5 "Wetland"   randomForest.formula,18 0.1822965 rfPermute,6
# result.6 "Desert"    randomForest.formula,18 0.5186782 rfPermute,6

results[, "RF_permuted"][[1]]
# An rfPermute model
# 
# Type of random forest: regression 
# Number of trees: 501 
# No. of variables tried at each split: 4 
# No. of permutation replicates: 100 
# Start time: 2023-04-18 22:56:38 
# End time: 2023-04-18 22:57:05 
# Run time: 26.8 secs 

这个results结果看起来想data.frame,实际上是一个Large list。
使用随机森林计算整体数据中环境生物因子对微生物香浓指数的重要性。

  1. 这里的%Var explained就是我们画柱状图时柱子的高度,大家可以抄录下来。
  2. importance函数调取了每个变量对shannon指数的重要性,用它们定义热图中空心圆的大小,但我们只需要调取p值小于0.05的变量。(将在后文统一调用)

绘制随机森林变量重要性柱状图

# 绘制柱状图
names <- c("ecosystem", "RF_r2")
bar <- as.data.frame(results[,names])
bar$ecosystem <- unlist(bar$ecosystem)
bar$RF_r2 <- unlist(bar$RF_r2)
str(bar)
# 'data.frame':	6 obs. of  2 variables:
# $ ecosystem: chr  "Whole" "Grassland" "Forest" "Cropland" ...
# $ RF_r2    : num  0.572 0.621 0.435 0.47 0.182 ...
colnames(bar) <- c("variable", "Exp")
bar$variable<- factor(bar$variable)
barplot<- ggplot(bar, aes(variable, Exp))+
  geom_bar(stat = "identity", fill = "steelblue")+
  scale_y_continuous(expand = c(0,0))+
  theme_classic(base_line_size = 0.75)+
  theme(panel.background = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_text(color = "black", size = 12))+
  labs(title = "Explained variation (%)", size = 12)
barplot

在这里插入图片描述

六组随机森林分析的解释量条形图已经完成,我们可以计算并绘制相关性热图部分。

计算数据集的相关性

使用cor()函数直接计算矩阵的相关性,相关矩阵的第14行,1到13列正是我们需要的香浓指数与13个环境因子的相关性分析结果。把它整理在一个表格内,并整理成绘图用的tidy数据,并对环境因子进行排序

注: cor()默认使用Pearson算法,如果对数据正态性不自信的话,建议修改为Spearman算法。

# 打包6个分类的随机森林模型结果
rf_list <- results[, "RF_permuted"]
# 修改6个分类模型的名称,改为对应的分类
names(rf_list) <- ecosystems
# 生成变量重要性的值,如果p值小于0.05则表示对应变量解释量
important_vars <- do.call(rbind, lapply(seq_along(results[, "RF_permuted"]), function(i) data.frame(ENV = row.names(importance(rf_list[[i]])), 
                                                            Important = ifelse(importance(rf_list[[i]])[, 2] < 0.05, 
                                                                               importance(rf_list[[i]])[, 1], NA),
                                                            Eco = rep(names(rf_list[i]), each = nrow(importance(rf_list[[i]]))))))
# 合并数据
circle <- data.frame(ENV = colnames(ENV)[3:15]) %>%
  left_join(important_vars) %>% 
  melt(id = c("ENV", "Eco"), value.name = "Importance"); circle
# Joining, by = "ENV"
# ENV       Eco  variable Importance
# 1  SOC     Whole Important         NA
# 2  SOC Grassland Important         NA
# 3  SOC    Forest Important  13.227866
# 4  SOC  Cropland Important         NA
# 5  SOC   Wetland Important         NA
# 6  SOC    Desert Important         NA
# 7   pH     Whole Important         NA
# 8   pH Grassland Important         NA
# 9   pH    Forest Important         NA
# 10  pH  Cropland Important         NA
# 11  pH   Wetland Important         NA
# 12  pH    Desert Important         NA
# 13 CEC     Whole Important         NA
# 14 CEC Grassland Important         NA
# 15 CEC    Forest Important         NA
# 16 CEC  Cropland Important   6.204274
# 17 CEC   Wetland Important         NA
# 18 CEC    Desert Important         NA
# 19  TP     Whole Important         NA
# 20  TP Grassland Important         NA
# 21  TP    Forest Important         NA
# 22  TP  Cropland Important         NA
# 23  TP   Wetland Important         NA
# 24  TP    Desert Important         NA
# 25  TK     Whole Important         NA
# 26  TK Grassland Important         NA
# 27  TK    Forest Important         NA
# 28  TK  Cropland Important   6.969350
# 29  TK   Wetland Important         NA
# 30  TK    Desert Important   6.783699
# 31  AK     Whole Important         NA
# 32  AK Grassland Important         NA
# 33  AK    Forest Important         NA
# 34  AK  Cropland Important   7.491691
# 35  AK   Wetland Important   7.208659
# 36  AK    Desert Important   6.601446
# 37  TN     Whole Important         NA
# 38  TN Grassland Important         NA
# 39  TN    Forest Important   6.953556
# 40  TN  Cropland Important         NA
# 41  TN   Wetland Important         NA
# 42  TN    Desert Important         NA
# 43 NO3     Whole Important   5.075570
# 44 NO3 Grassland Important         NA
# 45 NO3    Forest Important         NA
# 46 NO3  Cropland Important         NA
# 47 NO3   Wetland Important         NA
# 48 NO3    Desert Important         NA
# 49 NH4     Whole Important   4.594003
# 50 NH4 Grassland Important         NA
# 51 NH4    Forest Important   6.911031
# 52 NH4  Cropland Important         NA
# 53 NH4   Wetland Important         NA
# 54 NH4    Desert Important         NA
# 55 DOC     Whole Important  14.319739
# 56 DOC Grassland Important  18.475196
# 57 DOC    Forest Important  13.514009
# 58 DOC  Cropland Important         NA
# 59 DOC   Wetland Important         NA
# 60 DOC    Desert Important   5.634245
# 61 MAT     Whole Important         NA
# 62 MAT Grassland Important         NA
# 63 MAT    Forest Important         NA
# 64 MAT  Cropland Important         NA
# 65 MAT   Wetland Important         NA
# 66 MAT    Desert Important         NA
# 67  SD     Whole Important         NA
# 68  SD Grassland Important         NA
# 69  SD    Forest Important   6.683399
# 70  SD  Cropland Important   5.815041
# 71  SD   Wetland Important         NA
# 72  SD    Desert Important         NA
# 73  Pl     Whole Important  26.440467
# 74  Pl Grassland Important  12.635259
# 75  Pl    Forest Important         NA
# 76  Pl  Cropland Important  12.873442
# 77  Pl   Wetland Important   7.492399
# 78  Pl    Desert Important  15.139708

circle$ENV<- factor(circle$ENV, rev(colnames(ENV)[3:15]))

# 划分不同生态系统的数据集
df_eco <- split(ENV[, -1:-2], ENV$Ecosystem)
names(df_eco)

# 计算不同生态系统的相关性
r <- data.frame(ENV = colnames(ENV)[3:15],
               Whole = (cor(ENV[, -1:-2]))[14, 1:13],
               Grassland = (cor(df_eco$Grassland))[14, 1:13],
               Forest = (cor(df_eco$Forest))[14, 1:13],
               Cropland = (cor(df_eco$Cropland))[14, 1:13],
               Wetland = (cor(df_eco$Wetland))[14, 1:13],
               Desert = (cor(df_eco$Desert))[14, 1:13])%>%
  melt(id = "ENV", value.name = "Correlation");r
# ENV  variable  Correlation
# 1  SOC     Whole  0.190880826
# 2   pH     Whole -0.047988531
# 3  CEC     Whole  0.375623831
# 4   TP     Whole  0.204011032
# 5   TK     Whole  0.282682786
# 6   AK     Whole -0.238701400
# 7   TN     Whole  0.205713695
# 8  NO3     Whole -0.213326858
# 9  NH4     Whole -0.076868083
# 10 DOC     Whole  0.417261347
# 11 MAT     Whole -0.336919922
# 12  SD     Whole -0.272103426
# 13  Pl     Whole  0.590955861
# 14 SOC Grassland  0.389609962
# 15  pH Grassland -0.292803759
# 16 CEC Grassland  0.466440818
# 17  TP Grassland  0.240020091
# 18  TK Grassland  0.250093957
# 19  AK Grassland  0.295479703
# 20  TN Grassland  0.286305242
# 21 NO3 Grassland  0.060433644
# 22 NH4 Grassland -0.179338677
# 23 DOC Grassland  0.449023819
# 24 MAT Grassland -0.445949001
# 25  SD Grassland -0.511629899
# 26  Pl Grassland  0.657737041
# 27 SOC    Forest  0.419911131
# 28  pH    Forest -0.259555477
# 29 CEC    Forest  0.441782003
# 30  TP    Forest -0.043325378
# 31  TK    Forest  0.184701577
# 32  AK    Forest  0.314240260
# 33  TN    Forest  0.394795298
# 34 NO3    Forest  0.191051827
# 35 NH4    Forest -0.062053889
# 36 DOC    Forest  0.472024615
# 37 MAT    Forest -0.246051709
# 38  SD    Forest -0.427179574
# 39  Pl    Forest  0.445715297
# 40 SOC  Cropland  0.190504600
# 41  pH  Cropland  0.079739140
# 42 CEC  Cropland  0.372432772
# 43  TP  Cropland  0.173902720
# 44  TK  Cropland  0.339013659
# 45  AK  Cropland -0.567700216
# 46  TN  Cropland  0.179234867
# 47 NO3  Cropland -0.426516867
# 48 NH4  Cropland  0.047660104
# 49 DOC  Cropland  0.431862312
# 50 MAT  Cropland -0.388846551
# 51  SD  Cropland -0.129670246
# 52  Pl  Cropland  0.693855669
# 53 SOC   Wetland  0.062759256
# 54  pH   Wetland -0.230794196
# 55 CEC   Wetland  0.292344133
# 56  TP   Wetland  0.170356678
# 57  TK   Wetland  0.265791237
# 58  AK   Wetland  0.323268430
# 59  TN   Wetland -0.004741411
# 60 NO3   Wetland  0.172574329
# 61 NH4   Wetland -0.160782185
# 62 DOC   Wetland  0.436793024
# 63 MAT   Wetland -0.386445255
# 64  SD   Wetland -0.189241202
# 65  Pl   Wetland  0.343211982
# 66 SOC    Desert  0.008834840
# 67  pH    Desert  0.298885289
# 68 CEC    Desert  0.155105965
# 69  TP    Desert  0.253336024
# 70  TK    Desert  0.485088124
# 71  AK    Desert -0.554681227
# 72  TN    Desert  0.173607814
# 73 NO3    Desert -0.256845001
# 74 NH4    Desert -0.144874726
# 75 DOC    Desert  0.443678302
# 76 MAT    Desert -0.263961749
# 77  SD    Desert -0.160875488
# 78  Pl    Desert  0.648853903

r$ENV <- factor(r$ENV, levels = rev(colnames(ENV)[3:15]))

# 合并相关性与随机森林重要性的结果
r <- r %>% left_join(circle[c("ENV", "Importance")], by = "ENV");head(r)
# ENV variable Correlation Importance
# 1 SOC    Whole   0.1908808         NA
# 2 SOC    Whole   0.1908808         NA
# 3 SOC    Whole   0.1908808   13.22787
# 4 SOC    Whole   0.1908808         NA
# 5 SOC    Whole   0.1908808         NA
# 6 SOC    Whole   0.1908808         NA

热图可视化

heatmap <- ggplot()+
  geom_tile(data = r, aes(x = variable, y = ENV, fill = Correlation))+
  scale_fill_gradientn(colors = c('#2D6DB1', 'white', '#DC1623'),
                       limit = c(-1, 1))+
  geom_point(data = circle, aes(x = Eco, y = ENV,
                                size = Importance), shape = 21)+
  theme_bw()+
  theme(panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.text.x = element_text(angle = 45, color = "black",
                                   size = 12, vjust = 0.6),
        axis.text.y = element_text(color = 'black', size = 12),
        legend.title = element_text(size = 10))+
  labs(y = '', x = '')
heatmap

在这里插入图片描述

合并随机森林重要性和热图

# barplot + heatmap + 
#   plot_layout(ncol = 1, heights = c(1, 6))
barplot %>%
  aplot::insert_bottom(heatmap, height = 6)
# graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)

在这里插入图片描述
从结果来看,可以说是完美复刻!

附上所有代码

rm(list = ls())
# 在R中绘制随机森林模型中变量重要性和相关性的热图

library(vegan)
library(randomForest)
library(rfUtilities)
library(rfPermute)
library(ggplot2)
library(tidyverse)
library(patchwork)
library(export)
library(reshape2)

Micro <- openxlsx::read.xlsx("OTU1.xlsx", 1)
ENV <- openxlsx::read.xlsx("ENV1.xlsx", 1);head(ENV)
# Sample Ecosystem       SOC       pH       CEC        TP       TK       AK        TN        NO3        NH4      DOC
# 1     A1 Grassland  27.05671 7.904906 15.323426 0.9808826 14.75433 180.5585 2.1198671 125.335557  0.3171696 118.5123
# 2     A2 Grassland  25.76547 8.573666 18.104304 0.8527650 14.69093 166.8692 2.4470941 107.296478  2.4062394 124.6120
# 3     A3    Forest  12.33270 8.277955  8.297831 0.4298910 12.94322 121.5713 0.7090047   6.961546  2.1797286 129.7396
# 4     A4    Forest  16.63684 7.150481 15.632651 0.3711634 13.53909 194.0584 1.0868895  23.967231 29.2321159 129.3390
# 5     A5  Cropland 131.72950 7.548969 30.291618 0.6691908 14.19519 144.4154 5.4975148  38.032274 17.6463368 106.5602
# 6     A6  Cropland 123.75436 7.840254 34.357811 0.5828902 14.51699 387.8284 7.9689639  59.413622 12.7478292 105.8368
# MAT       SD       Pl
# 1 2.0162632 1.855115 274.0981
# 2 2.1286678 2.370189 369.1070
# 3 3.5848455 2.037255 263.5001
# 4 2.6846039 2.026378 346.7367
# 5 0.5787488 2.380261 333.7921
# 6 0.6756787 1.590068 240.1557
# 计算香浓指数,并把它合并在环境变量数据中
ENV$shannon <- diversity(t(Micro), index = "shannon")

# 生成一个总的与先前的不同生态系统合并成一个新的数据集
ENV_total <- ENV
ENV_total$Ecosystem <- "Whole"

ENV_new <- rbind(ENV_total, ENV)

# 确定不同生态系统类型的数量
unique(ENV_new$Ecosystem)
# [1] "Whole"     "Grassland" "Forest"    "Cropland"  "Wetland"   "Desert" 
# # 使用随机森林计算各生态系统中环境生物因子对微生物香浓指数的重要性
# set.seed(1)#设置种子,确保结果一致
# RF_total <- randomForest(shannon ~ . , ENV[, -1:-2], importance = T)
# RFs_total <- rfPermute(shannon ~ . , ENV[, -1:-2])
# RF_total;importance(RFs_total)[,1:2]
# # 这里的%Var explained就是我们画柱状图时柱子的高度,大家可以整理下来
# # importance函数调取了每个变量对shannon指数的重要性,用它们定义热图中空心圆的大小

# 构建自定义函数进行并行运算
library(foreach)
library(doParallel)

# 设置核心数
n_cores <- detectCores() - 1
cl <- makeCluster(n_cores)
registerDoParallel(cl)

# 获取生态系统列表
ecosystems <- unique(ENV_new$Ecosystem)

# 设置输出结果名称
options <- list()
options$results <- ecosystems

# 定义一个函数,对每个生态系统运行随机森林分析
run_single_rf <- function(data, eco) {
  if (!"Ecosystem" %in% names(data)) {
    stop("Error: 'Ecosystem' variable not found in data.")
  }
  if (!(eco %in% unique(data$Ecosystem))) {
    stop("Error: Invalid Ecosystem value.")
  }
  # 对当前生态系统的数据进行子集
  eco_data <- subset(data, Ecosystem == eco)[,-1:-2]
  
  # 设置种子
  set.seed(1)
  
  # 进行随机森林分析和置换检验
  RF <- randomForest(shannon~., eco_data, importance = T)
  RF_r2 <- rf.significance(RF, eco_data, nperm=99, ntree=501)$Rsquare
  RF_permuted <- rfPermute(shannon~., eco_data, nperm=99, ntree=501)
  
  # 返回结果
  list(ecosystem = eco, RF = RF, RF_r2 = RF_r2, RF_permuted = RF_permuted)
}

results <- foreach(i = ecosystems, .combine = rbind) %dopar% {
  
  library(randomForest)
  library(rfUtilities)
  library(rfPermute)
  
  # 运行单个随机森林分析
  run_single_rf(data = ENV_new, eco = i)
  
}

# 停止并行处理
stopCluster(cl)
registerDoSEQ()

# 确认结果
head(results)
# ecosystem   RF                      RF_r2     RF_permuted
# result.1 "Whole"     randomForest.formula,18 0.5722844 rfPermute,6
# result.2 "Grassland" randomForest.formula,18 0.6213995 rfPermute,6
# result.3 "Forest"    randomForest.formula,18 0.4347413 rfPermute,6
# result.4 "Cropland"  randomForest.formula,18 0.4695682 rfPermute,6
# result.5 "Wetland"   randomForest.formula,18 0.1822965 rfPermute,6
# result.6 "Desert"    randomForest.formula,18 0.5186782 rfPermute,6

results[, "RF_permuted"][[1]]
# An rfPermute model
# 
# Type of random forest: regression 
# Number of trees: 501 
# No. of variables tried at each split: 4 
# No. of permutation replicates: 100 
# Start time: 2023-04-18 22:56:38 
# End time: 2023-04-18 22:57:05 
# Run time: 26.8 secs 

# 绘制柱状图
names <- c("ecosystem", "RF_r2")
bar <- as.data.frame(results[,names])
bar$ecosystem <- unlist(bar$ecosystem)
bar$RF_r2 <- unlist(bar$RF_r2)
str(bar)
# 'data.frame':	6 obs. of  2 variables:
# $ ecosystem: chr  "Whole" "Grassland" "Forest" "Cropland" ...
# $ RF_r2    : num  0.572 0.621 0.435 0.47 0.182 ...
colnames(bar) <- c("variable", "Exp")
bar$variable<- factor(bar$variable)
barplot<- ggplot(bar, aes(variable, Exp))+
  geom_bar(stat = "identity", fill = "steelblue")+
  scale_y_continuous(expand = c(0,0))+
  theme_classic(base_line_size = 0.75)+
  theme(panel.background = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_text(color = "black", size = 12))+
  labs(title = "Explained variation (%)", size = 12)
barplot

# 打包6个分类的随机森林模型结果
rf_list <- results[, "RF_permuted"]
# 修改6个分类模型的名称,改为对应的分类
names(rf_list) <- ecosystems
# 生成变量重要性的值,如果p值小于0.05则表示对应变量解释量
important_vars <- do.call(rbind, lapply(seq_along(results[, "RF_permuted"]), function(i) data.frame(ENV = row.names(importance(rf_list[[i]])), 
                                                            Important = ifelse(importance(rf_list[[i]])[, 2] < 0.05, 
                                                                               importance(rf_list[[i]])[, 1], NA),
                                                            Eco = rep(names(rf_list[i]), each = nrow(importance(rf_list[[i]]))))))
# 合并数据
circle <- data.frame(ENV = colnames(ENV)[3:15]) %>%
  left_join(important_vars) %>% 
  melt(id = c("ENV", "Eco"), value.name = "Importance"); circle
# Joining, by = "ENV"
# ENV       Eco  variable Importance
# 1  SOC     Whole Important         NA
# 2  SOC Grassland Important         NA
# 3  SOC    Forest Important  13.227866
# 4  SOC  Cropland Important         NA
# 5  SOC   Wetland Important         NA
# 6  SOC    Desert Important         NA
# 7   pH     Whole Important         NA
# 8   pH Grassland Important         NA
# 9   pH    Forest Important         NA
# 10  pH  Cropland Important         NA
# 11  pH   Wetland Important         NA
# 12  pH    Desert Important         NA
# 13 CEC     Whole Important         NA
# 14 CEC Grassland Important         NA
# 15 CEC    Forest Important         NA
# 16 CEC  Cropland Important   6.204274
# 17 CEC   Wetland Important         NA
# 18 CEC    Desert Important         NA
# 19  TP     Whole Important         NA
# 20  TP Grassland Important         NA
# 21  TP    Forest Important         NA
# 22  TP  Cropland Important         NA
# 23  TP   Wetland Important         NA
# 24  TP    Desert Important         NA
# 25  TK     Whole Important         NA
# 26  TK Grassland Important         NA
# 27  TK    Forest Important         NA
# 28  TK  Cropland Important   6.969350
# 29  TK   Wetland Important         NA
# 30  TK    Desert Important   6.783699
# 31  AK     Whole Important         NA
# 32  AK Grassland Important         NA
# 33  AK    Forest Important         NA
# 34  AK  Cropland Important   7.491691
# 35  AK   Wetland Important   7.208659
# 36  AK    Desert Important   6.601446
# 37  TN     Whole Important         NA
# 38  TN Grassland Important         NA
# 39  TN    Forest Important   6.953556
# 40  TN  Cropland Important         NA
# 41  TN   Wetland Important         NA
# 42  TN    Desert Important         NA
# 43 NO3     Whole Important   5.075570
# 44 NO3 Grassland Important         NA
# 45 NO3    Forest Important         NA
# 46 NO3  Cropland Important         NA
# 47 NO3   Wetland Important         NA
# 48 NO3    Desert Important         NA
# 49 NH4     Whole Important   4.594003
# 50 NH4 Grassland Important         NA
# 51 NH4    Forest Important   6.911031
# 52 NH4  Cropland Important         NA
# 53 NH4   Wetland Important         NA
# 54 NH4    Desert Important         NA
# 55 DOC     Whole Important  14.319739
# 56 DOC Grassland Important  18.475196
# 57 DOC    Forest Important  13.514009
# 58 DOC  Cropland Important         NA
# 59 DOC   Wetland Important         NA
# 60 DOC    Desert Important   5.634245
# 61 MAT     Whole Important         NA
# 62 MAT Grassland Important         NA
# 63 MAT    Forest Important         NA
# 64 MAT  Cropland Important         NA
# 65 MAT   Wetland Important         NA
# 66 MAT    Desert Important         NA
# 67  SD     Whole Important         NA
# 68  SD Grassland Important         NA
# 69  SD    Forest Important   6.683399
# 70  SD  Cropland Important   5.815041
# 71  SD   Wetland Important         NA
# 72  SD    Desert Important         NA
# 73  Pl     Whole Important  26.440467
# 74  Pl Grassland Important  12.635259
# 75  Pl    Forest Important         NA
# 76  Pl  Cropland Important  12.873442
# 77  Pl   Wetland Important   7.492399
# 78  Pl    Desert Important  15.139708

circle$ENV<- factor(circle$ENV, rev(colnames(ENV)[3:15]))

# 划分不同生态系统的数据集
df_eco <- split(ENV[, -1:-2], ENV$Ecosystem)
names(df_eco)

# 计算不同生态系统的相关性
r <- NULL
r <- data.frame(ENV = colnames(ENV)[3:15],
               Whole = (cor(ENV[, -1:-2]))[14, 1:13],
               Grassland = (cor(df_eco$Grassland))[14, 1:13],
               Forest = (cor(df_eco$Forest))[14, 1:13],
               Cropland = (cor(df_eco$Cropland))[14, 1:13],
               Wetland = (cor(df_eco$Wetland))[14, 1:13],
               Desert = (cor(df_eco$Desert))[14, 1:13])%>%
  melt(id = "ENV", value.name = "Correlation");r
# ENV  variable  Correlation
# 1  SOC     Whole  0.190880826
# 2   pH     Whole -0.047988531
# 3  CEC     Whole  0.375623831
# 4   TP     Whole  0.204011032
# 5   TK     Whole  0.282682786
# 6   AK     Whole -0.238701400
# 7   TN     Whole  0.205713695
# 8  NO3     Whole -0.213326858
# 9  NH4     Whole -0.076868083
# 10 DOC     Whole  0.417261347
# 11 MAT     Whole -0.336919922
# 12  SD     Whole -0.272103426
# 13  Pl     Whole  0.590955861
# 14 SOC Grassland  0.389609962
# 15  pH Grassland -0.292803759
# 16 CEC Grassland  0.466440818
# 17  TP Grassland  0.240020091
# 18  TK Grassland  0.250093957
# 19  AK Grassland  0.295479703
# 20  TN Grassland  0.286305242
# 21 NO3 Grassland  0.060433644
# 22 NH4 Grassland -0.179338677
# 23 DOC Grassland  0.449023819
# 24 MAT Grassland -0.445949001
# 25  SD Grassland -0.511629899
# 26  Pl Grassland  0.657737041
# 27 SOC    Forest  0.419911131
# 28  pH    Forest -0.259555477
# 29 CEC    Forest  0.441782003
# 30  TP    Forest -0.043325378
# 31  TK    Forest  0.184701577
# 32  AK    Forest  0.314240260
# 33  TN    Forest  0.394795298
# 34 NO3    Forest  0.191051827
# 35 NH4    Forest -0.062053889
# 36 DOC    Forest  0.472024615
# 37 MAT    Forest -0.246051709
# 38  SD    Forest -0.427179574
# 39  Pl    Forest  0.445715297
# 40 SOC  Cropland  0.190504600
# 41  pH  Cropland  0.079739140
# 42 CEC  Cropland  0.372432772
# 43  TP  Cropland  0.173902720
# 44  TK  Cropland  0.339013659
# 45  AK  Cropland -0.567700216
# 46  TN  Cropland  0.179234867
# 47 NO3  Cropland -0.426516867
# 48 NH4  Cropland  0.047660104
# 49 DOC  Cropland  0.431862312
# 50 MAT  Cropland -0.388846551
# 51  SD  Cropland -0.129670246
# 52  Pl  Cropland  0.693855669
# 53 SOC   Wetland  0.062759256
# 54  pH   Wetland -0.230794196
# 55 CEC   Wetland  0.292344133
# 56  TP   Wetland  0.170356678
# 57  TK   Wetland  0.265791237
# 58  AK   Wetland  0.323268430
# 59  TN   Wetland -0.004741411
# 60 NO3   Wetland  0.172574329
# 61 NH4   Wetland -0.160782185
# 62 DOC   Wetland  0.436793024
# 63 MAT   Wetland -0.386445255
# 64  SD   Wetland -0.189241202
# 65  Pl   Wetland  0.343211982
# 66 SOC    Desert  0.008834840
# 67  pH    Desert  0.298885289
# 68 CEC    Desert  0.155105965
# 69  TP    Desert  0.253336024
# 70  TK    Desert  0.485088124
# 71  AK    Desert -0.554681227
# 72  TN    Desert  0.173607814
# 73 NO3    Desert -0.256845001
# 74 NH4    Desert -0.144874726
# 75 DOC    Desert  0.443678302
# 76 MAT    Desert -0.263961749
# 77  SD    Desert -0.160875488
# 78  Pl    Desert  0.648853903
r$ENV <- factor(r$ENV, levels = rev(colnames(ENV)[3:15]))

# 合并相关性与随机森林重要性的结果
r <- r %>% left_join(circle[c("ENV", "Importance")], by = "ENV");head(r)
# ENV variable Correlation Importance
# 1 SOC    Whole   0.1908808         NA
# 2 SOC    Whole   0.1908808         NA
# 3 SOC    Whole   0.1908808   13.22787
# 4 SOC    Whole   0.1908808         NA
# 5 SOC    Whole   0.1908808         NA
# 6 SOC    Whole   0.1908808         NA

heatmap <- ggplot()+
  geom_tile(data = r, aes(x = variable, y = ENV, fill = Correlation))+
  scale_fill_gradientn(colors = c('#2D6DB1', 'white', '#DC1623'),
                       limit = c(-1, 1))+
  geom_point(data = circle, aes(x = Eco, y = ENV,
                                size = Importance), shape = 21)+
  theme_bw()+
  theme(panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.text.x = element_text(angle = 45, color = "black",
                                   size = 12, vjust = 0.6),
        axis.text.y = element_text(color = 'black', size = 12),
        legend.title = element_text(size = 10))+
  labs(y = '', x = '')
heatmap

# barplot + heatmap +
#   plot_layout(ncol = 1, heights = c(1, 6))
# graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)


# barplot + heatmap + 
#   plot_layout(ncol = 1, heights = c(1, 6))
barplot %>%
  aplot::insert_bottom(heatmap, height = 6)
graph2ppt(file = "ppt.ppt", height = 8, width = 5, append = T)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/12343.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Vue3——一文入门Vue3

Vue3的优势 1. 性能的提升 打包大小减少41% 初次渲染快55%&#xff0c;更新渲染快133% 内存减少54% … 2. 源码的升级 使用Proxy代替defineProperty实现响应式 重写虚拟DOM的实现和Tree-Shaking … 3. 拥抱TypeScript Vue3可以更好的支持TypeScript 4. 新的特性 1.C…

什么是文件共享软件?文件传输软件如何共享?

它是一个文件共享软件应用程序&#xff0c;可让强大的数据保护层下将任何大小的文件发送到世界上的任何地方。以光速发送和共享无限数量的文件。可以提交门户并使用语言&#xff0c;品牌&#xff0c;存储等自定义门户。可以选择一个存储点&#xff0c;例如文件传输软件&#xf…

零基础可以学习数据分析吗,有没有好的培训机构推荐?

数据分析从沿海火到了中西部的软件园&#xff0c;从传统互联网企业火到了新经济领域&#xff0c;火到了第一二产业。数字化成为这个时代的标签&#xff0c;而数据也成为了最有价值的资源&#xff0c;更多企业重视数据&#xff1b;因为有了真实数据的支撑&#xff0c;所有的决策…

【软考备战·希赛网每日一练】2023年4月19日

文章目录 一、今日成绩二、错题总结第一题第二题第三题 三、知识查缺 题目及解析来源&#xff1a;2023年04月19日软件设计师每日一练 一、今日成绩 二、错题总结 第一题 解析&#xff1a; 第二题 解析&#xff1a; server-side n.服务器端 enterprise n.企业 client n.客户 d…

常见排序算法

目录 一、插入排序 1、直接插入排序 2、希尔排序(缩小增量插入排序&#xff09; 二、选择排序 三、堆排序 四、冒泡排序 五、快速排序&#xff08;递归&#xff09; 1、交换法 2、挖坑法 3、前后指针法&#xff08;推荐&#xff09; 4、快排再优化 六、快速排序&…

树上差分(点差分/边差分)

树上差分一般有两种类型的题目&#xff0c;一种是对边进行差分&#xff0c;另一种就是对点进行差分。 对应的操作也有两种&#xff0c;对边进行差分的对应操作就是给定一对节点(u,v)&#xff0c;让我们把u到v之间路径上的边权都加val&#xff0c;对点进行差分的对应操作就是给…

MYSQL数据库

目录 SQL SQL-DDL 操作数据库 查询&#xff08;show&#xff09;&#xff08;select&#xff09; 创建&#xff08;create&#xff09; 删除&#xff08;drop&#xff09; 操作表 查询当前数据库所有表 修改表 删除 SQL-DML 添加数据&#xff08;可以批量添加&…

课程简介:.Net Core从零学习搭建权限管理系统

课程简介目录 &#x1f680;前言一、课程背景二、课程目的三、系统功能四、系统技术架构五、课程特点六、课程适合人员七、课程规划的章节八、最后 &#x1f680;前言 本文是《.Net Core从零学习搭建权限管理系统》教程专栏的导航站&#xff08;点击链接&#xff0c;跳转到专栏…

做好Python工程师,首先你需要做好的几件事

做好Python工程师&#xff0c;需要做好的几件事&#xff0c;我想分享给大家。首先千万不要做事周折。在你提问之前&#xff0c;先好好想一想&#xff0c;这个问题自己能不能解决。如果能解决&#xff0c;尽量自己解决&#xff1b;如果解决不了&#xff0c;那就要把你的问题描述…

亿发软件:传统食品饮料批发行业如何通过信息化管理系统降本增效?

传统食品饮料批发行业信息化水平较低&#xff0c;存在多重管理难题&#xff0c;例如&#xff1a; 手动数据输入和管理&#xff0c;导致错误和效率低下&#xff1b; 数据缺乏实时可见性&#xff0c;无法实时了解企业仓库存量、销售额和其他关键业务指标&#xff1b; 低效的供应链…

索引:索引知识重复习,什么是索引、索引的类型、建立索引及【最左匹配原则】、Explain查看sql的执行计划

文章目录 什么是索引索引的类型主键索引&#xff08;primary key&#xff09;普通索引&#xff08;index&#xff09;复合索引全文索引&#xff08;fulltext&#xff09;空间索引唯一索引索引修改及删除 Explain一、using filesort(减慢查询效率)二、Using temporary三、using …

前端UI框架有哪些|20个优秀免费开源的WEB前端UI框架提高网站开发效率

最近准备学习一下前端UI我也是在网上找了很久最终整理出来了20个不错的前端UI框架网站,大家都知道很多成熟的前端框架可以直接引,学习框架可以提升我们网站的开发速度。有些大型公司的前端或者后端框架都是用自己开发的,对于大部分用户和公司来讲,我们可以用开源免费的前端…

Python 中 SyntaxError: ‘yield‘ outside function 错误

当我们在函数外部使用 yield 关键字时&#xff0c;会出现 Python “SyntaxError: ‘yield’ outside function”。 要解决该错误&#xff0c;如果我们需要对每个元素执行一些运算符&#xff0c;请使用列表理解&#xff0c;或者缩进函数内部使用 yield 的代码。 下面是一个产生…

毕业2年,跳槽到下一个公司就25K了,厉害了···

本人本科就读于某普通院校&#xff0c;毕业后通过同学的原因加入软件测试这个行业&#xff0c;角色也从测试小白到了目前的资深工程师&#xff0c;从功能测试转变为测试开发&#xff0c;并顺利拿下了某二线城市互联网企业的Offer&#xff0c;年薪 30W 。 选择和努力哪个重要&a…

写博客8年与人生第一个502万

题记&#xff1a;我们并非生来强大&#xff0c;但依然可以不负青春。 原本想好好写一下如何制定一个目标并通过一点一滴的努力去实现&#xff0c;这三年反思发现其实写自己的经历并不重要。 很多人都听过一句话&#xff1a;榜样的力量是无穷的。 更现实和实际的情况是&#x…

mysql聚合函数

文章目录 前言一、常见的聚合函数1.avg和sum函数2.max和min函数3.count函数 二、group by的使用1.基本使用方法2.with rollup 求平均值 三、having关键字的使用四、多表连接聚合函数1.sql92语法总结2.sql99语法总结 总结 前言 聚合函数&#xff1a;他是对一组数据进行汇总的函…

3 个自定义防抖 Hooks 的实现原理

前言— 本文通过实现 useDebounceFn、useDebounce、useDebounceEffect 3 种自定义防抖 Hooks&#xff0c;来介绍在日常开发过程中自定义 Hooks 的思路及实现&#xff0c;帮助大家完成通用 Hooks 来提高开发效率。 防抖— 防抖的概念已经司空见惯了&#xff0c;这里稍作简单介…

Golang每日一练(leetDay0034) 二叉树专题(3)

目录 100. 相同的树 Same Tree &#x1f31f; 101. 对称二叉树 Symmetric Tree &#x1f31f; 102. 二叉树的层序遍历 Binary Tree Level-order Traversal &#x1f31f;&#x1f31f; &#x1f31f; 每日一练刷题专栏 &#x1f31f; Golang每日一练 专栏 Python每日一…

程序设计方法学

体育竞技分析 问题分析 体育竞技分析 需求&#xff1a;毫厘是多少&#xff1f; 如何科学分析体育竞技比赛&#xff1f; 输入&#xff1a;球员的水平 输出&#xff1a;可预测的比赛成绩 体育竞技分析&#xff1a;模拟N场比赛 计算思维&#xff1a;抽象 自动化 模拟&am…

【算法题】2583. 二叉树中的第 K 大层和

题目&#xff1a; 给你一棵二叉树的根节点 root 和一个正整数 k 。 树中的 层和 是指 同一层 上节点值的总和。 返回树中第 k 大的层和&#xff08;不一定不同&#xff09;。如果树少于 k 层&#xff0c;则返回 -1 。 注意&#xff0c;如果两个节点与根节点的距离相同&…