Annotation in single-cell data using SSpMosaic

SSpMosaic annotation

In this tutorial we showcase how to use SSpMosaic for the annotation of single-cell data. Before SSpMosaic annotation, the generate_module step should be executed on the reference dataset. If the reference dataset has multiple batches, the generate_module step needs to be run on each batch separately, followed by running the network-propagation step as a whole.

Package loading

library(SSpMosaic)
library(clusterProfiler)
library(org.Hs.eg.db)

Set the working directory

Users can replace this directory with an appropriate path when using

setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
data_dir <- paste0(current_dir,'/data/')

Data loading

The input data for SSpMosaic single-cell annotation is a Seurat object containing unsupervised clustering results.

The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.

seurat_object <- readRDS(paste0(data_dir,'/Inh_for_annotation.rds'))

SSpMosaic metaprogram loading

Load SSpMosaic programs generated by filter_module on a single reference dataset or follow the tutorial network-propagation to generate the gene sets corresponding to the SSpMosaic metaprograms. Here we load SSpMosaic metaprograms generated from multiple batches as shown in tutorial network-propagation.The example metaprograms required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1QOKvUgN0tE068-BmedRsdqa_mz8tzwDF.

meta_module_genes <- gene_list(paste0(data_dir,'/Inh_meta_module_genes.txt'))

Transfer to Ensembl ID

Since the genes of the data to be annotated are Ensembl IDs, the SSpMosaic metaprogram will be converted to Ensembl IDs.

#Initialize a list to store the SSpMosaic metaprograms in Ensembl ID
meta_module_genes_ENSEMBL <- list()
#Iterate over each SSpMosaic metaprogram
for(m in 1:length(names(meta_module_genes)))
{
  #Extract the name of the metaprogram
  n <- names(meta_module_genes)[m]
  print(n)
  #Extract the corresponding gene set of the metaprogram
  gene_names <- meta_module_genes[[n]] 
  #Transfer gene symbol to Ensembl ID
  bitr_results <- bitr(gene_names, fromType="SYMBOL", toType="ENSEMBL", OrgDb="org.Hs.eg.db")
  #Generate the mapping vector from symbol to Ensembl ID
  gene_to_ensembl <- setNames(bitr_results$ENSEMBL, bitr_results$SYMBOL)
  #Generate Ensembl ID gene vector
  ensembl_ids <- sapply(gene_names, function(x) gene_to_ensembl[x])
  #Filter NA
  ensembl_ids <- ensembl_ids[!is.na(ensembl_ids)]
  #Save Ensembl ID gene vector into the list
  meta_module_genes_ENSEMBL[[n]] <- ensembl_ids
}
meta_module_genes <- meta_module_genes_ENSEMBL

Weighted average score

The function weighted_score is used to perform a weighted average scoring for each metaprogram based on the frequency of gene occurrence in the gene set.

The parameters of weighted_score are as follows:

object: A seurat object of single cell data.
meta_moudule_genes: A list where each item is a character array storing the gene set corresponding to the SSpMosaic (meta)programs
normalize_method: A string representing the normalization method, which should be selected from ‘none’, ‘zscore’, or ‘min-max’
assay: A string indicating the name of assay to pull the gene expressiong data from
layer: A string indicating the name of layer to fetch data from,for Seurat v5 only.
slot: A string indicating the name of slot to fetch data from,for Seurat v4 only.
anno: A boolean variable that specifies whether to annotate the cells with the highest-scoring meta-module,by default FALSE

#Calculate weighted average score for each metaprogram
seurat_object <- weighted_score(object = seurat_object,meta_module_genes = meta_module_genes,assay = 'RNA',layer = 'data',slot = 'data',normalize_method = 'none')

SSpMosaic annotation

The function SSpMosaic_sc_annotation is used to annotate the query single cell data,based on the weighted average scores.

The parameters of SSpMosaic_sc_annotation are as follows:

object: A seurat object ,the return value of the weighted_score function
meta_moudule_genes: A list where each item is a character array storing the gene set corresponding to the SSpMosaic (meta)programs; it must match the meta_moudule_genes parameter used in weighted_score.
cluster_col: A string indicating the column name in the Seurat object metadata corresponding to the cell cluster assignment.
sd_thres: A non-negative double that specifies the threshold for the difference between two standard deviations.
mean_thres: A non-negative double,indicating the threshold of the difference between two mean values.
annotation_name:A string,indicating the column to add in metadata to store the annotation information.

#Run SSpMosaic annotation
res <- SSpMosaic_sc_annotation(object = seurat_object,meta_module_genes = meta_module_genes,cluster_col = "seurat_clusters")

View annotation results

Draw umap plot to show SSpMosaic results

#Summarize different meta modules representing the same cell type.
res[["object"]]@meta.data[["annotation"]] <- gsub('_2','',res[["object"]]@meta.data[["annotation"]])
#Specify the color for each celltype
cols <- c('#926E6D', '#F1CB56', '#E7DDFF', '#40664C', '#EF4243', '#053450')
names(cols) <- c('Inh.SST','Inh.PVALB','Inh.LAMP5','Inh.CHANDELIER','Inh.VIP','Inh.SNCG')
#Draw umap plot
DimPlot(res[["object"]],group.by = 'annotation',reduction = 'umap',cols = cols) + labs(title = "SSpMosaic: Cell type")

Annotation in single-cell data using SSpMosaic

2025-07-30

SSpMosaic annotation

Package loading

Set the working directory

Data loading

SSpMosaic metaprogram loading

Transfer to Ensembl ID

Weighted average score

SSpMosaic annotation

View annotation results