R/plot.regressPCObject.R
, R/regressPC.R
regressPC.Rd
The S3 plot method generates plots to visualize the results of regression analyses performed on principal components (PCs) against cell types or dataset origin (query vs. reference).
This function performs linear regression of a covariate of interest onto one
or more principal components, based on the data in a SingleCellExperiment
object.
# S3 method for class 'regressPCObject'
plot(x, plot_type = c("r_squared", "p-value"), alpha = 0.05, ...)
regressPC(
reference_data,
query_data = NULL,
ref_cell_type_col,
query_cell_type_col = NULL,
cell_types = NULL,
pc_subset = 1:10,
adjust_method = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr",
"none"),
assay_name = "logcounts"
)
An object of class regressPC
containing the output of the regressPC
function
Type of plot to generate. Options are "r_squared" and "p-value". Default is "r-squared".
Significance threshold p-values of coefficients. Default is 0.05.
Additional arguments to be passed to the plotting functions.
A SingleCellExperiment
object containing numeric expression matrix for the reference cells.
A SingleCellExperiment
object containing numeric expression matrix for the query cells.
If NULL, the PC scores are regressed against the cell types of the reference data.
The column name in the colData
of reference_data
that identifies the cell types.
The column name in the colData
of query_data
that identifies the cell types.
A character vector specifying the cell types to include in the plot. If NULL, all cell types are included.
A numeric vector specifying which principal components to include in the plot. Default is PC1 to PC5.
A character string specifying the method to adjust the p-values. Options include "BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", or "none". Default is "BH" (Benjamini-Hochberg). Default is "BH".
Name of the assay on which to perform computations. Default is "logcounts".
The S3 plot method returns a ggplot
object representing the specified plot type.
A list
containing
summaries of the linear regression models for each specified principal component,
the corresponding R-squared (R2) values,
the variance contributions for each principal component, and
the total variance explained.
The S3 plot method generates, depending on the specified plot type, either the R-squared values or p-values resulting from the regression of principal components onto cell types or dataset origin (query vs. reference). For cell type regression, the plots show how well each PC correlates with different cell types. For dataset regression, the plots compare the PCs between query and reference datasets.
Principal component regression, derived from PCA, can be used to quantify the variance explained by a covariate of interest. Applications for single-cell analysis include quantification of batch effects, assessing clustering homogeneity, and evaluating alignment of query and reference datasets in cell type annotation settings.
Briefly, the \(R^2\) is calculated from a linear regression of the covariate \(B\) of interest onto each principal component. The variance contribution of the covariate effect per principal component is then calculated as the product of the variance explained by the i-th principal component (PC) and the corresponding \(R^2(PC_i | B)\). The sum across all variance contributions by the covariate effects in all principal components gives the total variance explained by the covariate as follows:
$$Var(C|B) = \sum_{i=1}^G \text{Var}(C|PC_i) \times R^2(PC_i | B)$$
where, \(\text{Var}(C \mid PC_i)\) is the variance of the data matrix \(C\) explained by the i-th principal component. See references for details.
Luecken et al. Benchmarking atlas-level data integration in single-cell genomics. Nature Methods, 19:41-50, 2022.
regressPC
plot.regressPCObject
# Load data
data("reference_data")
data("query_data")
# Plot the PC data (no query data)
regress_res <- regressPC(reference_data = reference_data,
ref_cell_type_col = "expert_annotation",
cell_types = c("CD4", "CD8", "B_and_plasma", "Myeloid"),
pc_subset = 1:15)
# Plot results
plot(regress_res, plot_type = "r_squared")
plot(regress_res, plot_type = "p-value")
# Plot the PC data (with query data)
regress_res <- regressPC(reference_data = reference_data,
query_data = query_data,
ref_cell_type_col = "expert_annotation",
query_cell_type_col = "SingleR_annotation",
cell_types = c("CD4", "CD8", "B_and_plasma", "Myeloid"),
pc_subset = 1:15)
# Plot results
plot(regress_res, plot_type = "r_squared")
plot(regress_res, plot_type = "p-value")