R/calculateCellDistancesSimilarity.R
calculateCellDistancesSimilarity.Rd
This function computes Bhattacharyya coefficients and Hellinger distances to quantify the similarity of density distributions between query cells and reference data for each cell type.
calculateCellDistancesSimilarity(
query_data,
reference_data,
query_cell_type_col,
ref_cell_type_col,
cell_names,
pc_subset = 1:5,
assay_name = "logcounts"
)
A SingleCellExperiment
object containing numeric expression matrix for the query cells.
A SingleCellExperiment
object containing numeric expression matrix for the reference cells.
The column name in the colData
of query_data
that identifies the cell types.
The column name in the colData
of reference_data
that identifies the cell types.
A character vector specifying the names of the query cells for which to compute distance measures.
A numeric vector specifying which principal components to include in the plot. Default is 1:5.
Name of the assay on which to perform computations. Default is "logcounts".
A list containing distance data for each cell type. Each entry in the list contains:
A vector of all pairwise distances within the reference subset for the cell type.
A matrix of distances from each query cell to all reference cells for the cell type.
This function first computes distance data using the calculateCellDistances
function, which calculates
pairwise distances between cells within the reference data and between query cells and reference cells in the PCA space.
Bhattacharyya coefficients and Hellinger distances are calculated to quantify the similarity of density distributions between query
cells and reference data for each cell type. Bhattacharyya coefficient measures the similarity of two probability distributions,
while Hellinger distance measures the distance between two probability distributions.
Bhattacharyya coefficients range between 0 and 1. A value closer to 1 indicates higher similarity between distributions, while a value closer to 0 indicates lower similarity
Hellinger distances range between 0 and 1. A value closer to 0 indicates higher similarity between distributions, while a value closer to 1 indicates lower similarity.
# Load data
data("reference_data")
data("query_data")
# Plot the PC data
distance_data <- calculateCellDistances(query_data = query_data,
reference_data = reference_data,
query_cell_type_col = "SingleR_annotation",
ref_cell_type_col = "expert_annotation",
pc_subset = 1:10)
# Identify outliers for CD4
cd4_anomalies <- detectAnomaly(reference_data = reference_data,
query_data = query_data,
query_cell_type_col = "SingleR_annotation",
ref_cell_type_col = "expert_annotation",
pc_subset = 1:10,
n_tree = 500,
anomaly_treshold = 0.5)
cd4_top6_anomalies <- names(sort(cd4_anomalies$CD4$query_anomaly_scores, decreasing = TRUE)[1:6])
# Get overlap measures
overlap_measures <- calculateCellDistancesSimilarity(query_data = query_data,
reference_data = reference_data,
cell_names = cd4_top6_anomalies,
query_cell_type_col = "SingleR_annotation",
ref_cell_type_col = "expert_annotation",
pc_subset = 1:10)
overlap_measures
#> $bhattacharyya_coef
#> Cell CD4 CD8 B_and_plasma Myeloid
#> 1 TTGGAACTCGCTTAGA-1 0.4664148 0.9678914 0.3587941 0.2574115
#> 2 GATCGCGTCCCATTAT-1 0.6695226 0.9709480 0.3327168 0.2051765
#> 3 AGCATACGTTGAGTTC-1 0.6526016 0.9609405 0.3085990 0.2102890
#> 4 TAAGCGTGTAATCGTC-1 0.6049953 0.9677820 0.3331004 0.1622547
#> 5 CCATGTCTCGTCTGCT-1 0.6158507 0.9812964 0.3421234 0.2000697
#> 6 TGCCCTAAGCGGCTTC-1 0.6184935 0.9558004 0.3118239 0.1380546
#>
#> $hellinger_dist
#> Cell CD4 CD8 B_and_plasma Myeloid
#> 1 TTGGAACTCGCTTAGA-1 0.7304692 0.1791887 0.8007534 0.8617357
#> 2 GATCGCGTCCCATTAT-1 0.5748716 0.1704463 0.8168741 0.8915288
#> 3 AGCATACGTTGAGTTC-1 0.5894051 0.1976348 0.8315053 0.8886569
#> 4 TAAGCGTGTAATCGTC-1 0.6284940 0.1794938 0.8166392 0.9152843
#> 5 CCATGTCTCGTCTGCT-1 0.6197978 0.1367610 0.8110959 0.8943882
#> 6 TGCCCTAAGCGGCTTC-1 0.6176621 0.2102369 0.8295638 0.9284102
#>