SSpMosaic generate program

This tutorial demonstrates the process of constructing differential co-expression gene programs.

Loading package

library(SSpMosaic)

Set the working directory

Users can customize this directory by specifying an appropriate path as needed.

setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')

Loading data

The input for SSpMosaic is a Seurat object containing dimensionality reduction data (e.g., PCA results) and cell labels (e.g., from unsupervised clustering or cell type annotations). The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/12abksyeOY0xTHCo25c-jugRn_MlZzDK1 We recommend providing a predefined set of genes of interest (e.g., marker genes) for each cluster. Using this input, the algorithm will extract SSpMosaic programs accordingly. If no genes of interest are specified, the algorithm will automatically identify marker genes for each cluster and use them as candidate genes for SSpMosaic programs.

seurat_object <- readRDS(paste0(current_dir, "/data/human_brain4_preprocessed.rds"))
markers <- readRDS(paste0(current_dir, "/data/Inh_markers.rds"))

Generate candidate programs

The function generate_module is used to produce several candidate programs for each cell cluster within the Seurat object. These programs serve as potential representatives of the cell clusters. However, further screening and filtering are required to identify the final representative programs.The parameters of generate_module are as follows:

#Choose to normalize the counts of metacells for every cluster
norm  <- rep(TRUE,length(unique(seurat_object$celltype)))
names(norm) <- unique(seurat_object$celltype)
#Choose to use the expression values from the layer data  for every cluster
layer <- rep('data',length(unique(seurat_object$celltype)))
names(layer) <- unique(seurat_object$celltype)
#Run candidate program generation
generate_module(object = seurat_object,sample_name = 'human_brain4',cluster_col = 'celltype',out_dir = result_dir,normalize_metacell = norm,min_cell_number = 100,gene_use = markers,layer = layer,verbose = FALSE)

Read candidate programs

The function get_module is used to read the candidate programs generated by generate_module.The parameters of get_module are as follows:

#Read the generated candidate programs
m <- get_module(sample_name = 'human_brain4',read_dir = result_dir)

Calculating candidate program score on the dataset

The function score_module is used to calculate the score of each candidate program generated by generate_module on the dataset.The parameters of score_module are as follows:

#Calculate the candidate program scores on the dataset
score_module(module = m,sample_name = 'human_brain4',read_dir = result_dir,cluster_col = 'celltype',nbin = 20)

Program selection

The function filter_module is used to perform screening and filtering to identify the final representative programs based on the scores calculated by score_module.The parameters of filter_module are as follows:

#Filter the candidate programs to generate the final SSpMosaic programs for this dataset
res <- filter_module(object = seurat_object, module = m,sample_name = 'human_brain4',read_dir = result_dir,cluster_col = 'celltype')
## Reading module score
## Filtering module
## [1] "human.brain4.Inh.PVALB.1"
## [1] "human.brain4.Inh.SNCG.1"
## [1] "human.brain4.Inh.SNCG.2"
## [1] "human.brain4.Inh.SNCG.3"
## [1] "human.brain4.Inh.VIP.1"
## [1] "human.brain4.Inh.LAMP5.1"
## [1] "human.brain4.Inh.LAMP5.2"
## [1] "human.brain4.Inh.LAMP5.3"
## [1] "human.brain4.Inh.LAMP5.4"
## [1] "human.brain4.Inh.LAMP5.5"
## [1] "human.brain4.Inh.SST.1"
## [1] "human.brain4.Inh.SST.2"
## [1] "human.brain4.Inh.SST.3"
## [1] "human.brain4.Inh.SST.4"
## [1] "human.brain4.Inh.CHANDELIER.1"
## [1] "human.brain4.Inh.CHANDELIER.2"
## Saving module