R/calculateMMDPValue.R
calculateMMDPValue.Rd
This function performs the Maximum Mean Discrepancy (MMD) test for comparing distributions between two samples in PCA space using a custom implementation with permutation testing for better sensitivity.
calculateMMDPValue(
reference_data,
query_data = NULL,
ref_cell_type_col,
query_cell_type_col = NULL,
cell_types = NULL,
pc_subset = seq_len(5),
assay_name = "logcounts",
n_permutation = 100,
kernel_type = "gaussian",
sigma = NULL
)
A SingleCellExperiment
object containing numeric expression matrix for the reference cells.
A SingleCellExperiment
object containing numeric expression matrix for the query cells.
If NULL, the PC scores are regressed against the cell types of the reference data.
The column name in the colData
of reference_data
that identifies the cell types.
The column name in the colData
of query_data
that identifies the cell types.
A character vector specifying the cell types to include in the plot. If NULL, all cell types are included.
A numeric vector specifying which principal components to include in the plot. Default is PC1 to PC5.
Name of the assay on which to perform computations. Default is "logcounts".
Number of permutations for p-value calculation. Default is 100.
Type of kernel to use. Options are "gaussian" (default) or "linear".
Bandwidth parameter for Gaussian kernel. If NULL, uses median heuristic.
A named vector of p-values from the MMD test for each cell type.
The function performs the following steps:
Projects the data into the PCA space.
Subsets the data to the specified cell types and principal components.
Performs a custom MMD test with permutation-based p-values for each cell type.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). "A kernel two-sample test". Journal of Machine Learning Research, 13(1), 723-773.
# Load data
data("reference_data")
data("query_data")
# Calculate MMD p-values (with query data)
mmd_test <- calculateMMDPValue(reference_data = reference_data,
query_data = query_data,
ref_cell_type_col = "expert_annotation",
query_cell_type_col = "SingleR_annotation",
cell_types = c("CD4", "CD8"),
pc_subset = 1:5,
n_permutation = 30)
mmd_test
#> CD4 CD8
#> 0.03225806 0.03225806