cellphoneDB进行CCI以及可视化

除了cellchat，在单细胞转录组或者空间组的分析中，cellphoneDB也是一个常用的细胞通讯软件，这个数据库更注重配受体关系，对于有明确先验知识的配受体研究比较友好。

但值得注意的是，它的数据库只包括人的基因名称信息，所以在开始前，可以参考biomaRt进行同源基因转换-CSDN博客转换物种的同源基因名称

导入环境

import numpy as np
import pandas as pd
import scanpy as sc
import os
import sys
from scipy import sparse

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
import os
import anndata as ad
import pandas as pd
import ktplotspy as kpy
import matplotlib.pyplot as plt
%matplotlib inline

ktplotspy是利用cellphoneDB结果进行后续可视化的一个R包，安装可见GitHub - zktuong/ktplotspy: Python library for plotting Cellphonedb results. Ported from ktplots R package.

同源基因转换

adata = sc.read('/data/work/your_scRNA.h5ad')

gene_names = adata.var_names.tolist()
gene_names_df = pd.DataFrame(gene_names, columns=['gene_name'])
gene_names_df.to_csv('adata_gene_names.csv', index=False)

#biomart转换后得到新的csv

replacement_df = pd.read_csv('mouse_homologous_gene_conversion.csv')
replacement_df.columns = ['Gene.name', 'Gene.name.1','Chromosome.scaffold.name']
gene_mapping = dict(zip(replacement_df['Gene.name'], replacement_df['Gene.name.1']))

new_gene_names = [gene_mapping.get(gene, None) for gene in gene_names]

# 仅保留有对应人类基因名的基因
valid_indices = [i for i, gene in enumerate(new_gene_names) if gene is not None]
filtered_adata = adata[:, valid_indices]
filtered_adata.var_names = [new_gene_names[i] for i in valid_indices]

filtered_adata.obs.index = filtered_adata.obs.index.astype(str)
filtered_adata.var.index = filtered_adata.var.index.astype(str)
filtered_adata.write_h5ad('your_new_scRNA.h5ad')

保存替换完同源基因的anndata对象后可以重新读取，进行后续的分析，这里没有对应的同源基因被我过滤掉了（如果有更全面的分析需求可以注意一下这一点）

提取注释标签

#重新读取同源替换后的数据
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

adata.shape#检查数据大小
adata.obs['annotation'].values.describe()#检查注释信息

adata.obs['barcode_sample']=list(adata.obs_names)
adata.obs['annotation']=adata.obs['annotation'].astype(str)
adata.obs['cell_type']=list(adata.obs['annotation'])
meta=adata.obs.loc[:,['barcode_sample','cell_type']]
meta.to_csv('bin50_meta.tsv', sep = '\t')
#检查对应
list(adata.obs.index).sort() == list(df_meta['barcode_sample']).sort()

提取后续分析需要的注释信息表，其中'barcode_sample'是细胞唯一标识符，'annotation'是我的注释信息存放的列，导出为bin50_meta.tsv

运行cellphoneDB

设置好数据库路径，注释信息表路径，标准化后的anndata对象路径以及输出路径。

我是直接将v5数据库下载使用，因为我的环境不能联网，在线下载可能会方便很多，网上有很多教程。直接下载的路径见GitHub - ventolab/cellphonedb-data

cpdb_file_path = '/data/work/CCI_cellphoneDB/cellphonedb.zip'
meta_file_path = '/data/work/CCI_cellphoneDB/bin50_meta.tsv'
counts_file_path = '/data/work/CCI_cellphoneDB/bin50_annotated_normlog.h5ad'
out_path = '/data/work/CCI_cellphoneDB/bin50'

cpdb_results = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellphoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
 #   active_tfs_file_path = active_tf_path,           # optional: defines cell types and their active TFs.
 #  microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    score_interactions = True,                       # optional: whether to score interactions or not. 
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.05,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 5,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.1,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

运行日志以及输出的文件如下

读取结果可视化

相比cellchat，这个软件自带的可视化功能会少很多

自带可视化热图

kpy.plot_cpdb_heatmap(pvals = cpdb_results['pvalues'],
                      degs_analysis = False,
                      figsize = (5, 5),
                      title = "Sum of significant interactions")

R可视化点图

但是利用输出文件也可以在R环境下进行很多想要的可视化方式，首先导入R环境和包

library(ktplots)
library(Seurat)
library(tidyverse)
library(data.table)
library(dplyr)
library(readr)

导入需要的结果表和RDS格式数据

#cellphoneDB结果文件
mypvals <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_pvalues_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)
mymeans <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_means_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) 
mydecon <- read.table("/data/work/CCI_cellphoneDB/bin50/statistical_analysis_deconvoluted_12_26_2024_220659.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)

#单细胞数据
scRNA = read_rds("your_scRNA.rds")

绘制全部点图

p <- plot_cpdb(cell_type1 = "", cell_type2 = "", scdata = scRNA, means = mymeans, pvals = mypvals,
               celltype_key = "annotation",
               highlight_col = "blue",
               keep_significant_only=T) +
  theme(axis.text = element_text(size = 5, color = 'black'))

ggsave("plot_cpdb_output.png", p, width = 50, height = 40, units = "in", limitsize = FALSE)

如果这里cell_type空白则默认绘制所有分群的点图，也可以设置一个或者两个绘制一对多和一对一的配受体点图，效果如下

分类展示基因家族

这里参数还有一个family，可以分类展示不同家族的配受体关系

gene_families <- c("chemokines", "th1", "th2", "th17", "treg", "costimulatory", "coinhibitory")

for(family in gene_families) {
  tryCatch({
    p <- plot_cpdb(cell_type1 = "PT-S1", cell_type2 = "", scdata = scRNA, means = mymeans, pvals = mypvals,
                   celltype_key = "annotation_res1_1",
                   gene_family = family, # 指定基因家族
                   highlight_col = "blue",
                   keep_significant_only=T) +
      theme(axis.text = element_text(size = 10, color = 'black'))
    
    ggsave(paste0("plot_cpdb_PTS1_", family, ".png"), p, width = 20, height = 30, units = "in", limitsize = FALSE)
  }, error = function(e) {
    message(paste("No significant genes found for", family, "and plotting will not proceed."))
  })
}

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：/a/944620.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！