<~生~信~交~流~与~合~作~请~关~注~公~众~号@生信探索>
monocle3与PAGA有点类似,在UMAP图上显示轨迹图,没有了树状的结构。
原理、图的理解,可以参考Reference中的链接
安装
-
ubuntu
sudo apt install libudunits2-dev libgdal-dev
-
R
speedglm包违反CRAN规定被删除了,所以要从rstudio下载
using(pak)
pkgs <- c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'lme4', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment', 'batchelor', 'HDF5Array',
'terra', 'ggrastr',"hhoeflin/hdf5r","mojaveazure/loomR@develop",
"url::https://packagemanager.rstudio.com/cran/2023-03-31/src/contrib/speedglm_0.3-4.tar.gz")
pak::pkg_install(pkgs)
0. 加载R包、定义函数
using(monocle3, tidyverse, magrittr, Matrix, Seurat, SeuratObject, sp)
ps <- function(filename, plot = FALSE, w = 9, h = 6) {
if (is.object(plot)) {
print(plot)
}
plot <- recordPlot()
pdf(file = paste0(filename,".pdf"), onefile = T, width = w, height = h)
replayPlot(plot)
dev.off()
}
n_jobs <- 8 # 使用的线程数
1.创建monocle对象
-
sobj:Seurat对象 -
cell_type:已经注释好了细胞类型 -
orig.ident:批次信息 -
sobj_embed:UMAP降维信息,是数据框,行名是细胞,有两列分别对应两个维度 -
hvg:3000个高变基因用于后续降维 -
expression_matrix
, a numeric matrix of expression values, where rows are genes, and columns are cells -
cell_metadata
, a data frame, where rows are cells, and columns are cell attributes (such as cell type, culture condition, day captured, etc.) -
gene_metadata
, an data frame, where rows are features (e.g. genes), and columns are gene attributes, such as biotype, gc content, etc. one of the columns of thegene_metadata
should be named "gene_short_name", which represents the gene symbol or simple name (generally used for plotting) for each gene.
sobj <- readRDS("final.rds")
gene_metadata <- sobj@assays$RNA@meta.features
gene_metadata$gene_short_name <- rownames(gene_metadata)
cds <- new_cell_data_set(as(sobj@assays$RNA@counts,"sparseMatrix"),
cell_metadata = sobj@meta.data,
gene_metadata = gene_metadata
)
hvg <- seob@assays$integrated@var.features
sobj_embed <- Seurat::Embeddings(seob, reduction = "umap")
rm(sobj, gene_metadata)
2.pre-process
标准化、预处理、取批次
## Normalize and pre-process the data
cds %<>% preprocess_cds(num_dim = 30,method="PCA",norm_method="log",use_genes=hvg)
## Remove batch effects with cell alignment
cds %<>% align_cds(alignment_group = "orig.ident",preprocess_method="PCA")
3.Reduce dimensions and Cluster cells
降维、聚类、分群、分partition
这里使用UMAP作为降维算法,再使用轨迹分区算法,把所有细胞分为两个partitio,不同分区的细胞会进行单独的轨迹分析。
plot_pc_variance_explained(cds)
ps("PCA.pdf", w = 9, h = 6)
## Reduce the dimensions using UMAP
cds %<>% reduce_dimension(umap.fast_sgd = TRUE, cores = n_jobs, reduction_method = "UMAP", preprocess_method = "PCA")
## 从seurat导入整合过的umap坐标
cds_embed <- cds@int_colData$reducedDims$UMAP
cds@int_colData$reducedDims$UMAP <- sobj_embed[rownames(cds_embed),]
plot_cells(cds, reduction_method = "UMAP", color_cells_by = "cell_type",group_label_size=6)
ps("UMAP_CellType.pdf")
## Cluster the cells
cds %<>% rcluster_cells(reduction_method = "UMAP", cluster_method = "leiden",random_seed=1314)
plot_cells(cds, color_cells_by = "partition",group_label_size=6)
ps("UMAP_partition.pdf")
4.Order cells in pseudotime along a trajectory
手动选择root需要根据自己的生物学背景知识
## Learn a graph
cds %<>% learn_graph()
plot_cells(cds,trajectory_graph_segment_size = 1.5,
group_label_size = 6,group_cells_by="cluster",
color_cells_by = "cell_type", label_groups_by_cluster = FALSE,
label_leaves = FALSE, label_branch_points = FALSE
)
ps("UMAP_graph.pdf")
cds <- order_cells(cds)
plot_cells(cds,
group_label_size = 6,
color_cells_by = "pseudotime",
label_cell_groups = FALSE,
label_leaves = FALSE,
label_branch_points = FALSE,
graph_label_size = 1.5
)
ps("Trajectory_Pseudotime.pdf")
5.Perform differential expression analysis
这是空间差异分析常用的方法,在空间转录组中也有。
morans_I
column, which ranges from -1 to +1. A value of 0 indicates no effect, while +1 indicates perfect positive autocorrelation and suggests that nearby cells have very similar values of a gene's expression. Significant values much less than zero are generally rare.
dea_res <- graph_test(cds, neighbor_graph = "principal_graph", cores = n_jobs) %>%
dplyr::filter(q_value < 0.05)
fwrite(dea_res, "Trajectory_genes.csv")
6.Finding genes that change as a function of pseudotime
寻找拟时轨迹差异基因,使用graph_test分析最重要的结果是莫兰指数(morans_I),其值在-1至1之间,0代表此基因没有。空间共表达效应,1代表此基因在空间距离相近的细胞中表达值高度相似。根据莫兰指数挑选前10个基因用于可视化。
degs <- dea_res$gene_short_name
top_genes <- dea_res %>%
dplyr::top_n(n = 10, morans_I) %>%
dplyr::pull(gene_short_name) %>%
as.character()
plot_genes_in_pseudotime(cds[top_genes, ],
color_cells_by = "cell_type",
min_expr = 0.5, ncol = 2
)
ps("Top_Genes_Jitterplot.pdf", w = 9, h = 6)
p <- plot_cells(cds,
genes = top_genes, show_trajectory_graph = FALSE,
label_cell_groups = FALSE, label_leaves = FALSE
)
p$facet$params$ncol <- 5
ps("Top_Genes_Featureplot.pdf", plot = p)
7.Finding modules of co-regulated genes
寻找共表达基因模块,根据上边的差异分析结果,按照UMAP和Louvain 聚类,将这些基因分在不同的模块中,有些模块在某些细胞中特异高表达。
Once we have a set of genes that vary in some interesting way across the clusters, Monocle provides a means of grouping them into modules. We can call find_gene_modules(), which essentially runs UMAP on the genes (as opposed to the cells) and then groups them into modules using Louvain community analysis
gene_module_df <- find_gene_modules(cds[degs, ], resolution = 1e-2, cores = n_jobs)
fwrite(gene_module_df, "Genes_Module.csv")
cell_group_df <- tibble::tibble(
cell = row.names(colData(cds)),
cell_group = colData(cds)$cell_type
)
agg_mat <- aggregate_gene_expression(cds, gene_module_df, cell_group_df)
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))
pheatmap::pheatmap(agg_mat,
cluster_rows = TRUE, cluster_cols = TRUE,
scale = "column", clustering_method = "ward.D2",
fontsize = 6
)
ps("heatmap.pdf", w = 5, h = 16)
plot_cells(cds,
genes = gene_module_df %>% dplyr::filter(module %in% c(200, 169, 138, 22,17,91)),
group_cells_by = "partition",
color_cells_by = "partition",
show_trajectory_graph = FALSE
)
ps("module.pdf")
saveRDS(cds,file="cds.rds")
Reference
https://zhuanlan.zhihu.com/p/451727080
https://cole-trapnell-lab.github.io/monocle3/docs
https://www.jianshu.com/p/c402b6588e17
本文由 mdnice 多平台发布