SSpMosaic integration

In this tutorial we showcase how to use SSpMosaic for data integration. Before SSpMosaic integration, the generate_module step should be executed on each batch of data individually, followed by running the network-propagation step as a whole.

Package loading

library(SSpMosaic)

Set the working directory

Users can replace this directory with an appropriate path when using

setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
data_dir <- paste0(current_dir,'/data/')

Data loading

The input data for SSpMosaic integration should be a Seurat object containing dimensionality reduction data (e.g., PCA results) , and there should be a column in its metadata that stores the batch information corresponding to each cell.In addition, the column represented by the cluster_col parameter input when running the generate_module function separately also needs to be retained.

The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1Gz1oy30BcDSrKWqWWG8GPHcSXYsYWK6P.

seurat_object <- readRDS(paste0(data_dir,'PBMC_merged_data.rds'))

Program label assaignment

Combine the column representing batch information in the Seurat object’s metadata with the column represented by the cluster_col parameter in the generate_module function to generate the program label corresponding to each cell.

#Combine the column representing batch information(here column 'sample' ) and the cluster column used in generate_module function (here column 'Cluster')
seurat_object$program <- paste(seurat_object$sample,seurat_object$cluster,sep = '_')

Load program distance matrix

Load the distance matrix generated by the function gene_program_network_propagation_dist

The example distance matrix required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1Gz1oy30BcDSrKWqWWG8GPHcSXYsYWK6P.

distance_matrix <- read.csv(paste0(data_dir,'module_distance_matrix.csv'))
rownames(distance_matrix) <- distance_matrix[,1]
distance_matrix <- distance_matrix[,-1]

Metaprogram label assaignment

Run hierarchical clustering on the program distance matrix and cut the hierarchical clustering tree to assaign metaprogram label.

#Run hierarchical clustering
p <- pheatmap::pheatmap(1-distance_matrix,fontsize_row = 12,fontsize_col = 12,cellwidth = 12,cellheight = 12,clustering_method = 'mcquitty',silent = T)
#set seed for reproducibility
set.seed(123)
#Cut the hierarchical clustering tree
clu <- cutree(p$tree_row,3)
sub_clu <- clu[clu == 2]
clu <- clu[clu != 2]
clu[clu == 3] <- 2
#Run hierarchical clustering
p2 <- pheatmap::pheatmap(1-distance_matrix[names(sub_clu),names(sub_clu)],fontsize_row = 12,fontsize_col = 12,cellwidth = 12,cellheight = 12,clustering_method = 'mcquitty',silent = T)
#set seed for reproducibility
set.seed(123)
#Cut the hierarchical clustering tree
sub_clu <- cutree(p2$tree_row,2)
sub_clu <- sub_clu +2
#The correspondence between SSpMosaic metaprograms and programs
clu <- c(clu,sub_clu)
#Assign metaprogram labels to each cell based on the program labels according to the correspondence between SSpMosaic metaprograms and programs
meta_label <- ifelse(seurat_object$program %in% names(clu), 
                                 clu[seurat_object$program], 
                                 seurat_object$program)
#Save the metaprogram labels as a column in the Seurat object's metadata
seurat_object$meta_label <- meta_label

SSpMosaic integration

The function SSpMosaic_meta_integration.Seurat is used to run SSpMosaic integration on a Seurat object based on the batch labels as provided and metaprogram lables as generated above.

The parameters of SSpMosaic_meta_integration.Seurat are as follows:

#Specify the modality that each batch belongs to
sample_group_by_modality <- list('RNA' = c('scRNA1','scRNA2'),'Pro' = 'scPro3','ATAC' = 'scATAC4')
#Run SSpMosaic integration
res <- SSpMosaic_meta_integration.Seurat(object = seurat_object,col_batch = 'sample',col_SSpMosaic_cluster = 'meta_label',n_meta_neighbors = 1 , neighbors_within_modality = 3, neighbors_across_modality = 10,batch_group_by_modality = sample_group_by_modality,reduction_dimention = 3:30,calculate_umap = F)
#Perform clustering using the results of SSpMosaic integration
res <- FindClusters(res,graph.name = 'SSpMosaic',resolution = 0.1)

Integration result

Show SSpMosaic integration result.

#Specify the color for each modality
modalityCols <- c('#487b09', '#fc724f' ,'#ea69c9')
names(modalityCols) <- c("scATAC", "scRNA", "scPro")
#Specify the color for each celltype
celltypeCols <- c('#843C39', '#a1821c', '#d9c596', '#31A354')
names(celltypeCols) <- c('NK','Tcell','Bcell','Myeloid')
#Show integration result
DimPlot(res,reduction = 'umap',group.by = 'modality')+scale_color_manual(values = modalityCols) + labs(title = "SSpMosaic: Modality")

DimPlot(res,reduction = 'umap',group.by = 'cluster')+scale_color_manual(values = celltypeCols) + labs(title = "SSpMosaic: Cell type")

DimPlot(res,reduction = 'umap',group.by = 'seurat_clusters') + labs(title = "SSpMosaic: Cluster")