This function compares the principal components (PCs) obtained from separate PCA on reference and query datasets for a single cell type using either cosine similarity or correlation.

The S3 plot method generates a heatmap to visualize the cosine similarities between principal components from the output of the comparePCA function.

comparePCA(
  reference_data,
  query_data,
  query_cell_type_col,
  ref_cell_type_col,
  pc_subset = 1:5,
  n_top_vars = 50,
  metric = c("cosine", "correlation"),
  correlation_method = c("spearman", "pearson")
)

# S3 method for class 'comparePCAObject'
plot(x, ...)

Arguments

reference_data

A SingleCellExperiment object containing numeric expression matrix for the reference cells.

query_data

A SingleCellExperiment object containing numeric expression matrix for the query cells.

query_cell_type_col

The column name in the colData of query_data that identifies the cell types.

ref_cell_type_col

The column name in the colData of reference_data that identifies the cell types.

pc_subset

A numeric vector specifying the subset of principal components (PCs) to compare. Default is the first five PCs.

n_top_vars

An integer indicating the number of top loading variables to consider for each PC. Default is 50.

metric

The similarity metric to use. It can be either "cosine" or "correlation". Default is "cosine".

correlation_method

The correlation method to use if metric is "correlation". It can be "spearman" or "pearson". Default is "spearman".

x

A numeric matrix output from the comparePCA function, representing cosine similarities between query and reference principal components.

...

Additional arguments passed to the plotting function.

Value

A similarity matrix comparing the principal components of the reference and query datasets. Each element (i, j) in the matrix represents the similarity between the i-th principal component of the reference dataset and the j-th principal component of the query dataset.

The S3 plot method returns a ggplot object representing the heatmap of cosine similarities.

Details

This function compares the PCA results between the reference and query datasets by computing cosine similarities or correlations between the loadings of top variables for each pair of principal components. It first extracts the PCA rotation matrices from both datasets and identifies the top variables with highest loadings for each PC. Then, it computes the cosine similarities or correlations between the loadings of top variables for each pair of PCs. The resulting matrix contains the similarity values, where rows represent reference PCs and columns represent query PCs.

The S3 plot method converts the input matrix into a long-format data frame suitable for plotting with ggplot2. The rows in the heatmap are ordered in reverse to match the conventional display format. The heatmap uses a blue-white-red color gradient to represent cosine similarity values, where blue indicates negative similarity, white indicates zero similarity, and red indicates positive similarity.

See also

plot.comparePCAObject

comparePCA

Examples

# Load libraries
library(scran)
library(scater)

# Load data
data("reference_data")
data("query_data")

# Extract CD4 cells
ref_data_subset <- reference_data[, which(reference_data$expert_annotation == "CD4")]
query_data_subset <- query_data[, which(query_data$expert_annotation == "CD4")]

# Selecting highly variable genes (can be customized by the user)
ref_top_genes <- getTopHVGs(ref_data_subset, n = 500)
query_top_genes <- getTopHVGs(query_data_subset, n = 500)

# Intersect the gene symbols to obtain common genes
common_genes <- intersect(ref_top_genes, query_top_genes)
ref_data_subset <- ref_data_subset[common_genes,]
query_data_subset <- query_data_subset[common_genes,]

# Run PCA on datasets separately
ref_data_subset <- runPCA(ref_data_subset)
query_data_subset <- runPCA(query_data_subset)

# Call the PCA comparison function
similarity_mat <- comparePCA(query_data = query_data_subset,
                             reference_data = ref_data_subset,
                             query_cell_type_col = "expert_annotation",
                             ref_cell_type_col = "expert_annotation",
                             pc_subset = 1:5,
                             n_top_vars = 50,
                             metric = c("cosine", "correlation")[1],
                             correlation_method = c("spearman", "pearson")[1])

# Create the heatmap
plot(similarity_mat)