This function computes Bhattacharyya coefficients and Hellinger distances to quantify the similarity of density distributions between query cells and reference data for each cell type.

calculateCellDistancesSimilarity(
  query_data,
  reference_data,
  query_cell_type_col,
  ref_cell_type_col,
  cell_names,
  pc_subset = 1:5,
  assay_name = "logcounts"
)

Arguments

query_data

A SingleCellExperiment object containing numeric expression matrix for the query cells.

reference_data

A SingleCellExperiment object containing numeric expression matrix for the reference cells.

query_cell_type_col

The column name in the colData of query_data that identifies the cell types.

ref_cell_type_col

The column name in the colData of reference_data that identifies the cell types.

cell_names

A character vector specifying the names of the query cells for which to compute distance measures.

pc_subset

A numeric vector specifying which principal components to include in the plot. Default is 1:5.

assay_name

Name of the assay on which to perform computations. Default is "logcounts".

Value

A list containing distance data for each cell type. Each entry in the list contains:

ref_distances

A vector of all pairwise distances within the reference subset for the cell type.

query_to_ref_distances

A matrix of distances from each query cell to all reference cells for the cell type.

Details

This function first computes distance data using the calculateCellDistances function, which calculates pairwise distances between cells within the reference data and between query cells and reference cells in the PCA space. Bhattacharyya coefficients and Hellinger distances are calculated to quantify the similarity of density distributions between query cells and reference data for each cell type. Bhattacharyya coefficient measures the similarity of two probability distributions, while Hellinger distance measures the distance between two probability distributions.

Bhattacharyya coefficients range between 0 and 1. A value closer to 1 indicates higher similarity between distributions, while a value closer to 0 indicates lower similarity

Hellinger distances range between 0 and 1. A value closer to 0 indicates higher similarity between distributions, while a value closer to 1 indicates lower similarity.

Examples

# Load data
data("reference_data")
data("query_data")

# Plot the PC data
distance_data <- calculateCellDistances(query_data = query_data, 
                                        reference_data = reference_data, 
                                        query_cell_type_col = "SingleR_annotation", 
                                        ref_cell_type_col = "expert_annotation",
                                        pc_subset = 1:10) 

# Identify outliers for CD4
cd4_anomalies <- detectAnomaly(reference_data = reference_data, 
                               query_data = query_data, 
                               query_cell_type_col = "SingleR_annotation", 
                               ref_cell_type_col = "expert_annotation",
                               pc_subset = 1:10,
                               n_tree = 500,
                               anomaly_treshold = 0.5)
cd4_top6_anomalies <- names(sort(cd4_anomalies$CD4$query_anomaly_scores, decreasing = TRUE)[1:6])

# Get overlap measures
overlap_measures <- calculateCellDistancesSimilarity(query_data = query_data, 
                                                     reference_data = reference_data, 
                                                     cell_names = cd4_top6_anomalies,
                                                     query_cell_type_col = "SingleR_annotation", 
                                                     ref_cell_type_col = "expert_annotation",
                                                     pc_subset = 1:10) 
overlap_measures
#> $bhattacharyya_coef
#>                 Cell       CD4       CD8 B_and_plasma   Myeloid
#> 1 TTGGAACTCGCTTAGA-1 0.4664148 0.9678914    0.3587941 0.2574115
#> 2 GATCGCGTCCCATTAT-1 0.6695226 0.9709480    0.3327168 0.2051765
#> 3 AGCATACGTTGAGTTC-1 0.6526016 0.9609405    0.3085990 0.2102890
#> 4 TAAGCGTGTAATCGTC-1 0.6049953 0.9677820    0.3331004 0.1622547
#> 5 CCATGTCTCGTCTGCT-1 0.6158507 0.9812964    0.3421234 0.2000697
#> 6 TGCCCTAAGCGGCTTC-1 0.6184935 0.9558004    0.3118239 0.1380546
#> 
#> $hellinger_dist
#>                 Cell       CD4       CD8 B_and_plasma   Myeloid
#> 1 TTGGAACTCGCTTAGA-1 0.7304692 0.1791887    0.8007534 0.8617357
#> 2 GATCGCGTCCCATTAT-1 0.5748716 0.1704463    0.8168741 0.8915288
#> 3 AGCATACGTTGAGTTC-1 0.5894051 0.1976348    0.8315053 0.8886569
#> 4 TAAGCGTGTAATCGTC-1 0.6284940 0.1794938    0.8166392 0.9152843
#> 5 CCATGTCTCGTCTGCT-1 0.6197978 0.1367610    0.8110959 0.8943882
#> 6 TGCCCTAAGCGGCTTC-1 0.6176621 0.2102369    0.8295638 0.9284102
#>