In this tutorial we showcase how to use SSpMosaic for data
integration. Before SSpMosaic integration, the
generate_module
step should be executed on each batch of
data individually, followed by running the
network-propagation
step as a whole.
library(SSpMosaic)
Users can replace this directory with an appropriate path when using
setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
data_dir <- paste0(current_dir,'/data/')
The input data for SSpMosaic integration should be a Seurat object
containing dimensionality reduction data (e.g., PCA results) , and there
should be a column in its metadata that stores the batch information
corresponding to each cell.In addition, the column represented by the
cluster_col
parameter input when running the
generate_module
function separately also needs to be
retained.
The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1Gz1oy30BcDSrKWqWWG8GPHcSXYsYWK6P.
seurat_object <- readRDS(paste0(data_dir,'PBMC_merged_data.rds'))
Combine the column representing batch information in the Seurat
object’s metadata with the column represented by the
cluster_col
parameter in the generate_module
function to generate the program label corresponding to each cell.
#Combine the column representing batch information(here column 'sample' ) and the cluster column used in generate_module function (here column 'Cluster')
seurat_object$program <- paste(seurat_object$sample,seurat_object$cluster,sep = '_')
Load the distance matrix generated by the function
gene_program_network_propagation_dist
The example distance matrix required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/1Gz1oy30BcDSrKWqWWG8GPHcSXYsYWK6P.
distance_matrix <- read.csv(paste0(data_dir,'module_distance_matrix.csv'))
rownames(distance_matrix) <- distance_matrix[,1]
distance_matrix <- distance_matrix[,-1]
Run hierarchical clustering on the program distance matrix and cut the hierarchical clustering tree to assaign metaprogram label.
#Run hierarchical clustering
p <- pheatmap::pheatmap(1-distance_matrix,fontsize_row = 12,fontsize_col = 12,cellwidth = 12,cellheight = 12,clustering_method = 'mcquitty',silent = T)
#set seed for reproducibility
set.seed(123)
#Cut the hierarchical clustering tree
clu <- cutree(p$tree_row,3)
sub_clu <- clu[clu == 2]
clu <- clu[clu != 2]
clu[clu == 3] <- 2
#Run hierarchical clustering
p2 <- pheatmap::pheatmap(1-distance_matrix[names(sub_clu),names(sub_clu)],fontsize_row = 12,fontsize_col = 12,cellwidth = 12,cellheight = 12,clustering_method = 'mcquitty',silent = T)
#set seed for reproducibility
set.seed(123)
#Cut the hierarchical clustering tree
sub_clu <- cutree(p2$tree_row,2)
sub_clu <- sub_clu +2
#The correspondence between SSpMosaic metaprograms and programs
clu <- c(clu,sub_clu)
#Assign metaprogram labels to each cell based on the program labels according to the correspondence between SSpMosaic metaprograms and programs
meta_label <- ifelse(seurat_object$program %in% names(clu),
clu[seurat_object$program],
seurat_object$program)
#Save the metaprogram labels as a column in the Seurat object's metadata
seurat_object$meta_label <- meta_label
The function SSpMosaic_meta_integration.Seurat
is used
to run SSpMosaic integration on a Seurat object based on the batch
labels as provided and metaprogram lables as generated above.
The parameters of SSpMosaic_meta_integration.Seurat
are
as follows:
object: A Seurat object containing reduction result,batch label and metaprogram label.
col_batch: A string indicating the column name in the Seurat object metadata corresponding to the batch label.
col_SSpMosaic_cluster: A string indicating the column name in the Seurat object metadata corresponding to the metaprogram label.
reduction_use: A string indicating the name of the reduction result used for calculating distance
reduction_dimention: A vector of positive integer indicating the dimensions used for calculating distance
batch_group_by_modality: A list named by modality/species names, containing the batch names belonging to this modality/species.
n_meta_neighbors: A positive integer indicating the number of meta neighbors.
neighbors_within_modality: A positive integer,the number of cell neighbors from the same modality/species
neighbors_across_modality: A positive integer,the number of cell neighbors from the different modality/species
graph_name: A string indicating the name used to save the SSpMosaic integration result
calculate_umap: A boolean variable that specifies whether to compute the UMAP dimensionality reduction results after integration
seed: An integer indicating random seed
#Specify the modality that each batch belongs to
sample_group_by_modality <- list('RNA' = c('scRNA1','scRNA2'),'Pro' = 'scPro3','ATAC' = 'scATAC4')
#Run SSpMosaic integration
res <- SSpMosaic_meta_integration.Seurat(object = seurat_object,col_batch = 'sample',col_SSpMosaic_cluster = 'meta_label',n_meta_neighbors = 1 , neighbors_within_modality = 3, neighbors_across_modality = 10,batch_group_by_modality = sample_group_by_modality,reduction_dimention = 3:30,calculate_umap = F)
#Perform clustering using the results of SSpMosaic integration
res <- FindClusters(res,graph.name = 'SSpMosaic',resolution = 0.1)
Show SSpMosaic integration result.
#Specify the color for each modality
modalityCols <- c('#487b09', '#fc724f' ,'#ea69c9')
names(modalityCols) <- c("scATAC", "scRNA", "scPro")
#Specify the color for each celltype
celltypeCols <- c('#843C39', '#a1821c', '#d9c596', '#31A354')
names(celltypeCols) <- c('NK','Tcell','Bcell','Myeloid')
#Show integration result
DimPlot(res,reduction = 'umap',group.by = 'modality')+scale_color_manual(values = modalityCols) + labs(title = "SSpMosaic: Modality")
DimPlot(res,reduction = 'umap',group.by = 'cluster')+scale_color_manual(values = celltypeCols) + labs(title = "SSpMosaic: Cell type")
DimPlot(res,reduction = 'umap',group.by = 'seurat_clusters') + labs(title = "SSpMosaic: Cluster")