R/comparePCA.R
, R/plot.comparePCAObject.R
comparePCA.Rd
This function compares the principal components (PCs) obtained from separate PCA on reference and query datasets for a single cell type using either cosine similarity or correlation.
The S3 plot method generates a heatmap to visualize the cosine similarities between
principal components from the output of the comparePCA
function.
A SingleCellExperiment
object containing numeric expression matrix for the reference cells.
A SingleCellExperiment
object containing numeric expression matrix for the query cells.
The column name in the colData
of query_data
that identifies the cell types.
The column name in the colData
of reference_data
that identifies the cell types.
A numeric vector specifying the subset of principal components (PCs) to compare. Default is the first five PCs.
An integer indicating the number of top loading variables to consider for each PC. Default is 50.
The similarity metric to use. It can be either "cosine" or "correlation". Default is "cosine".
The correlation method to use if metric is "correlation". It can be "spearman" or "pearson". Default is "spearman".
A numeric matrix output from the comparePCA
function, representing
cosine similarities between query and reference principal components.
Additional arguments passed to the plotting function.
A similarity matrix comparing the principal components of the reference and query datasets. Each element (i, j) in the matrix represents the similarity between the i-th principal component of the reference dataset and the j-th principal component of the query dataset.
The S3 plot method returns a ggplot
object representing the heatmap of cosine similarities.
This function compares the PCA results between the reference and query datasets by computing cosine similarities or correlations between the loadings of top variables for each pair of principal components. It first extracts the PCA rotation matrices from both datasets and identifies the top variables with highest loadings for each PC. Then, it computes the cosine similarities or correlations between the loadings of top variables for each pair of PCs. The resulting matrix contains the similarity values, where rows represent reference PCs and columns represent query PCs.
The S3 plot method converts the input matrix into a long-format data frame
suitable for plotting with ggplot2
. The rows in the heatmap are ordered in
reverse to match the conventional display format. The heatmap uses a blue-white-red
color gradient to represent cosine similarity values, where blue indicates negative
similarity, white indicates zero similarity, and red indicates positive similarity.
plot.comparePCAObject
comparePCA
# Load libraries
library(scran)
library(scater)
# Load data
data("reference_data")
data("query_data")
# Extract CD4 cells
ref_data_subset <- reference_data[, which(reference_data$expert_annotation == "CD4")]
query_data_subset <- query_data[, which(query_data$expert_annotation == "CD4")]
# Selecting highly variable genes (can be customized by the user)
ref_top_genes <- getTopHVGs(ref_data_subset, n = 500)
query_top_genes <- getTopHVGs(query_data_subset, n = 500)
# Intersect the gene symbols to obtain common genes
common_genes <- intersect(ref_top_genes, query_top_genes)
ref_data_subset <- ref_data_subset[common_genes,]
query_data_subset <- query_data_subset[common_genes,]
# Run PCA on datasets separately
ref_data_subset <- runPCA(ref_data_subset)
query_data_subset <- runPCA(query_data_subset)
# Call the PCA comparison function
similarity_mat <- comparePCA(query_data = query_data_subset,
reference_data = ref_data_subset,
query_cell_type_col = "expert_annotation",
ref_cell_type_col = "expert_annotation",
pc_subset = 1:5,
n_top_vars = 50,
metric = c("cosine", "correlation")[1],
correlation_method = c("spearman", "pearson")[1])
# Create the heatmap
plot(similarity_mat)