This tutorial demonstrates the process of constructing differential co-expression gene programs.
library(SSpMosaic)
Users can customize this directory by specifying an appropriate path as needed.
setwd('./')
current_dir <- getwd()
result_dir <- paste0(current_dir,'/result/')
The input for SSpMosaic is a Seurat object containing dimensionality reduction data (e.g., PCA results) and cell labels (e.g., from unsupervised clustering or cell type annotations). The example data required to run this tutorial can be downloaded from https://drive.google.com/drive/folders/12abksyeOY0xTHCo25c-jugRn_MlZzDK1 We recommend providing a predefined set of genes of interest (e.g., marker genes) for each cluster. Using this input, the algorithm will extract SSpMosaic programs accordingly. If no genes of interest are specified, the algorithm will automatically identify marker genes for each cluster and use them as candidate genes for SSpMosaic programs.
seurat_object <- readRDS(paste0(current_dir, "/data/human_brain4_preprocessed.rds"))
markers <- readRDS(paste0(current_dir, "/data/Inh_markers.rds"))
The function generate_module
is used to produce several
candidate programs for each cell cluster within the Seurat object. These
programs serve as potential representatives of the cell clusters.
However, further screening and filtering are required to identify the
final representative programs.The parameters of
generate_module
are as follows:
object: A Seurat object containing reduction and cell cluster information.
cluster_col: A string indicating the column name in the Seurat object metadata corresponding to the cell cluster assignment.
meta_cell: A named integer vector, with names
corresponding to cell clusters. Specifies the number of meta cells in
hdWGCNA for each cluster. If NULL
, the default value is
used (the ceiling of the number of cells in each cluster divided by
30).
max_share: A named integer vector, with names
corresponding to cell clusters. Specifies the maximum number of shared
cells for meta cells in hdWGCNA for each cluster. If NULL
,
the default value is used.
soft_power: A named integer vector, with names
corresponding to cell clusters. Specifies the soft_power
parameter in hdWGCNA for each cluster. If NULL
, the default
value is used (the lowest power achieving a 0.8 scale-free topology
fit).
normalize_metacell: A named boolean vector that
specifies whether to normalize metacell in hdWGCNA for each
cluster,named by cell clusters,or NULL
to use the default
value.
cluster_chosen: A character vector specifying
the cell clusters to generate SSpMosaic programs. If NULL
,
all clusters are selected by default.
min_cell_number: A positive integer specifying the minimum number of cells required for a cell cluster.
sample_name: A string specifying the sample name.
out_dir: A string specifying the output directory path.
gene_use: A named list, with names corresponding
to cell clusters, specifying the gene candidates for SSpMosaic. If
NULL
, marker genes are used.
log2FC_thres: A positive double specifying the log2 fold-change threshold for identifying cell cluster markers. Used only when no gene set of interest is provided for the cluster.
min.pct: A non-negative double specifying the minimum fraction of cells in the cluster that must detect a gene for it to be tested. Used only when no gene set of interest is provided for the cluster.
min_metacell: A positive integer specifying the minimum number of meta cells.
assay: A named vector, with names corresponding
to cell clusters, specifying the assay used for hdWGCNA. If
NULL
, the default assay is used.
slot: A named vector, with names corresponding to cell clusters, specifying the slot used to extract data.
layer: A named vector, with names corresponding to cell clusters, specifying the layer used to extract data. Applicable only to Seurat v5.
verbose: A logical value indicating whether to display detailed information during program generation.
#Choose to normalize the counts of metacells for every cluster
norm <- rep(TRUE,length(unique(seurat_object$celltype)))
names(norm) <- unique(seurat_object$celltype)
#Choose to use the expression values from the layer data for every cluster
layer <- rep('data',length(unique(seurat_object$celltype)))
names(layer) <- unique(seurat_object$celltype)
#Run candidate program generation
generate_module(object = seurat_object,sample_name = 'human_brain4',cluster_col = 'celltype',out_dir = result_dir,normalize_metacell = norm,min_cell_number = 100,gene_use = markers,layer = layer,verbose = FALSE)
The function get_module
is used to read the candidate
programs generated by generate_module
.The parameters of
get_module
are as follows:
sample_name: A string specifying the sample
name; it must match the sample_name
parameter used in
generate_module
.
read_dir: A string specifying the output
directory path; it must match the out_dir
parameter used in
generate_module
.
#Read the generated candidate programs
m <- get_module(sample_name = 'human_brain4',read_dir = result_dir)
The function score_module
is used to calculate the score
of each candidate program generated by generate_module
on
the dataset.The parameters of score_module
are as
follows:
module: The return value of the
get_module
function.
sample_name: A string specifying the sample
name; it must match the sample_name
parameter used in
generate_module
.
read_dir: A string specifying the output
directory path; it must match the out_dir
parameter used in
generate_module
.
cluster_col: A string indicating the column name
in the Seurat object metadata corresponding to the cell cluster
assignment; it must match the cluster_col
parameter used in
generate_module
.
nbin: An integer specifying the number of bins
of aggregate expression levels for all analyzed features used in
Seurat::AddModuleScore
.
#Calculate the candidate program scores on the dataset
score_module(module = m,sample_name = 'human_brain4',read_dir = result_dir,cluster_col = 'celltype',nbin = 20)
The function filter_module
is used to perform screening
and filtering to identify the final representative programs based on the
scores calculated by score_module
.The parameters of
filter_module
are as follows:
object: A Seurat object; it must match the
object
parameter used in
generate_module
.
module: The return value of the
get_module
function.; it must match the module
parameter used in score_module
.
sample_name: A string specifying the sample
name; it must match the sample_name
parameter used in
generate_module
.
read_dir: A string specifying the output
directory path; it must match the out_dir
parameter used in
generate_module
.
cluster_col: A string indicating the column name
in the Seurat object metadata corresponding to the cell cluster
assignment; it must match the cluster_col
parameter used in
generate_module
.
sd_thres: A non-negative double that specifies
the threshold for the difference between two standard deviations; the
default value is 0.01, or it can be set to NA
to disable
filtering based on this metric.
mean_thres: A non-negative double,indicating the
threshold of the difference between two mean values; the default value
is NA
to disable filtering based on this metric.
pct_thres: A non-negative double,indicating the
threshold of the proportion of cells with program score greater than
zero; the default value is NA
to disable filtering based on
this metric.
merge_module: A boolean variable that specifies
whether to merge modules generated from the same clusters;the default
value is TRUE
.
#Filter the candidate programs to generate the final SSpMosaic programs for this dataset
res <- filter_module(object = seurat_object, module = m,sample_name = 'human_brain4',read_dir = result_dir,cluster_col = 'celltype')
## Reading module score
## Filtering module
## [1] "human.brain4.Inh.PVALB.1"
## [1] "human.brain4.Inh.SNCG.1"
## [1] "human.brain4.Inh.SNCG.2"
## [1] "human.brain4.Inh.SNCG.3"
## [1] "human.brain4.Inh.VIP.1"
## [1] "human.brain4.Inh.LAMP5.1"
## [1] "human.brain4.Inh.LAMP5.2"
## [1] "human.brain4.Inh.LAMP5.3"
## [1] "human.brain4.Inh.LAMP5.4"
## [1] "human.brain4.Inh.LAMP5.5"
## [1] "human.brain4.Inh.SST.1"
## [1] "human.brain4.Inh.SST.2"
## [1] "human.brain4.Inh.SST.3"
## [1] "human.brain4.Inh.SST.4"
## [1] "human.brain4.Inh.CHANDELIER.1"
## [1] "human.brain4.Inh.CHANDELIER.2"
## Saving module