In this tutorial we showcase how to use SSpMosaic for the annotation
of single-cell data. Before SSpMosaic annotation, the
generate_module
step should be executed on the reference
dataset. If the reference dataset has multiple batches, the
generate_module
step needs to be run on each batch
separately, followed by running the network-propagation
step as a whole.
library(SSpMosaic)
library(clusterProfiler)
library(org.Hs.eg.db)
Users can replace this directory with an appropriate path when using
setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
data_dir <- paste0(current_dir,'/data/')
The input data for SSpMosaic single-cell annotation is a Seurat object containing unsupervised clustering results.
The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.
seurat_object <- readRDS(paste0(data_dir,'/Inh_for_annotation.rds'))
Load SSpMosaic programs generated by filter_module
on a
single reference dataset or follow the tutorial
network-propagation
to generate the gene sets corresponding
to the SSpMosaic metaprograms. Here we load SSpMosaic metaprograms
generated from multiple batches as shown in tutorial
network-propagation
.The example metaprograms required to
run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.
meta_module_genes <- gene_list(paste0(data_dir,'/Inh_meta_module_genes.txt'))
Since the genes of the data to be annotated are Ensembl IDs, the SSpMosaic metaprogram will be converted to Ensembl IDs.
#Initialize a list to store the SSpMosaic metaprograms in Ensembl ID
meta_module_genes_ENSEMBL <- list()
#Iterate over each SSpMosaic metaprogram
for(m in 1:length(names(meta_module_genes)))
{
#Extract the name of the metaprogram
n <- names(meta_module_genes)[m]
print(n)
#Extract the corresponding gene set of the metaprogram
gene_names <- meta_module_genes[[n]]
#Transfer gene symbol to Ensembl ID
bitr_results <- bitr(gene_names, fromType="SYMBOL", toType="ENSEMBL", OrgDb="org.Hs.eg.db")
#Generate the mapping vector from symbol to Ensembl ID
gene_to_ensembl <- setNames(bitr_results$ENSEMBL, bitr_results$SYMBOL)
#Generate Ensembl ID gene vector
ensembl_ids <- sapply(gene_names, function(x) gene_to_ensembl[x])
#Filter NA
ensembl_ids <- ensembl_ids[!is.na(ensembl_ids)]
#Save Ensembl ID gene vector into the list
meta_module_genes_ENSEMBL[[n]] <- ensembl_ids
}
meta_module_genes <- meta_module_genes_ENSEMBL
The function weighted_score
is used to perform a
weighted average scoring for each metaprogram based on the frequency of
gene occurrence in the gene set.
The parameters of weighted_score
are as follows:
object: A seurat object of single cell data.
meta_moudule_genes: A list where each item is a character array storing the gene set corresponding to the SSpMosaic (meta)programs
normalize_method: A string representing the normalization method, which should be selected from ‘none’, ‘zscore’, or ‘min-max’
assay: A string indicating the name of assay to pull the gene expressiong data from
layer: A string indicating the name of layer to fetch data from,for Seurat v5 only.
slot: A string indicating the name of slot to fetch data from,for Seurat v4 only.
anno: A boolean variable that specifies whether to annotate the cells with the highest-scoring meta-module,by default FALSE
#Calculate weighted average score for each metaprogram
seurat_object <- weighted_score(object = seurat_object,meta_module_genes = meta_module_genes,assay = 'RNA',layer = 'data',slot = 'data',normalize_method = 'none')
The function SSpMosaic_sc_annotation
is used to annotate
the query single cell data,based on the weighted average scores.
The parameters of SSpMosaic_sc_annotation
are as
follows:
object: A seurat object ,the return value of the
weighted_score
function
meta_moudule_genes: A list where each item is a
character array storing the gene set corresponding to the SSpMosaic
(meta)programs; it must match the meta_moudule_genes
parameter used in weighted_score
.
cluster_col: A string indicating the column name in the Seurat object metadata corresponding to the cell cluster assignment.
sd_thres: A non-negative double that specifies the threshold for the difference between two standard deviations.
mean_thres: A non-negative double,indicating the threshold of the difference between two mean values.
annotation_name:A string,indicating the column to add in metadata to store the annotation information.
#Run SSpMosaic annotation
res <- SSpMosaic_sc_annotation(object = seurat_object,meta_module_genes = meta_module_genes,cluster_col = "seurat_clusters")
Draw umap plot to show SSpMosaic results
#Summarize different meta modules representing the same cell type.
res[["object"]]@meta.data[["annotation"]] <- gsub('_2','',res[["object"]]@meta.data[["annotation"]])
#Specify the color for each celltype
cols <- c('#926E6D', '#F1CB56', '#E7DDFF', '#40664C', '#EF4243', '#053450')
names(cols) <- c('Inh.SST','Inh.PVALB','Inh.LAMP5','Inh.CHANDELIER','Inh.VIP','Inh.SNCG')
#Draw umap plot
DimPlot(res[["object"]],group.by = 'annotation',reduction = 'umap',cols = cols) + labs(title = "SSpMosaic: Cell type")