SSpMosaic annotation

In this tutorial we showcase how to use SSpMosaic for the annotation of single-cell data. Before SSpMosaic annotation, the generate_module step should be executed on the reference dataset. If the reference dataset has multiple batches, the generate_module step needs to be run on each batch separately, followed by running the network-propagation step as a whole.

Package loading

library(SSpMosaic)
library(clusterProfiler)
library(org.Hs.eg.db)

Set the working directory

Users can replace this directory with an appropriate path when using

setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
data_dir <- paste0(current_dir,'/data/')

Data loading

The input data for SSpMosaic single-cell annotation is a Seurat object containing unsupervised clustering results.

The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.

seurat_object <- readRDS(paste0(data_dir,'/Inh_for_annotation.rds'))

SSpMosaic metaprogram loading

Load SSpMosaic programs generated by filter_module on a single reference dataset or follow the tutorial network-propagation to generate the gene sets corresponding to the SSpMosaic metaprograms. Here we load SSpMosaic metaprograms generated from multiple batches as shown in tutorial network-propagation.The example metaprograms required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.

meta_module_genes <- gene_list(paste0(data_dir,'/Inh_meta_module_genes.txt'))

Transfer to Ensembl ID

Since the genes of the data to be annotated are Ensembl IDs, the SSpMosaic metaprogram will be converted to Ensembl IDs.

#Initialize a list to store the SSpMosaic metaprograms in Ensembl ID
meta_module_genes_ENSEMBL <- list()
#Iterate over each SSpMosaic metaprogram
for(m in 1:length(names(meta_module_genes)))
{
  #Extract the name of the metaprogram
  n <- names(meta_module_genes)[m]
  print(n)
  #Extract the corresponding gene set of the metaprogram
  gene_names <- meta_module_genes[[n]] 
  #Transfer gene symbol to Ensembl ID
  bitr_results <- bitr(gene_names, fromType="SYMBOL", toType="ENSEMBL", OrgDb="org.Hs.eg.db")
  #Generate the mapping vector from symbol to Ensembl ID
  gene_to_ensembl <- setNames(bitr_results$ENSEMBL, bitr_results$SYMBOL)
  #Generate Ensembl ID gene vector
  ensembl_ids <- sapply(gene_names, function(x) gene_to_ensembl[x])
  #Filter NA
  ensembl_ids <- ensembl_ids[!is.na(ensembl_ids)]
  #Save Ensembl ID gene vector into the list
  meta_module_genes_ENSEMBL[[n]] <- ensembl_ids
}
meta_module_genes <- meta_module_genes_ENSEMBL

Weighted average score

The function weighted_score is used to perform a weighted average scoring for each metaprogram based on the frequency of gene occurrence in the gene set.

The parameters of weighted_score are as follows:

#Calculate weighted average score for each metaprogram
seurat_object <- weighted_score(object = seurat_object,meta_module_genes = meta_module_genes,assay = 'RNA',layer = 'data',slot = 'data',normalize_method = 'none')

SSpMosaic annotation

The function SSpMosaic_sc_annotation is used to annotate the query single cell data,based on the weighted average scores.

The parameters of SSpMosaic_sc_annotation are as follows:

#Run SSpMosaic annotation
res <- SSpMosaic_sc_annotation(object = seurat_object,meta_module_genes = meta_module_genes,cluster_col = "seurat_clusters")

View annotation results

Draw umap plot to show SSpMosaic results

#Summarize different meta modules representing the same cell type.
res[["object"]]@meta.data[["annotation"]] <- gsub('_2','',res[["object"]]@meta.data[["annotation"]])
#Specify the color for each celltype
cols <- c('#926E6D', '#F1CB56', '#E7DDFF', '#40664C', '#EF4243', '#053450')
names(cols) <- c('Inh.SST','Inh.PVALB','Inh.LAMP5','Inh.CHANDELIER','Inh.VIP','Inh.SNCG')
#Draw umap plot
DimPlot(res[["object"]],group.by = 'annotation',reduction = 'umap',cols = cols) + labs(title = "SSpMosaic: Cell type")