R/calculateGraphIntegration.R
, R/plot.calculateGraphIntegrationObject.R
calculateGraphIntegration.Rd
This function performs graph-based community detection to identify annotation inconsistencies by detecting query-only communities, true cross-cell-type mixing patterns, and local annotation inconsistencies based on immediate neighborhood analysis.
The S3 plot method generates visualizations of annotation consistency diagnostics, including query-only communities, cross-cell-type mixing, and local annotation inconsistencies.
calculateGraphIntegration(
query_data,
reference_data,
query_cell_type_col,
ref_cell_type_col,
cell_types = NULL,
pc_subset = 1:10,
k_neighbors = 30,
assay_name = "logcounts",
resolution = 0.1,
min_cells_per_community = 10,
min_cells_per_celltype = 20,
high_query_prop_threshold = 0.9,
cross_type_threshold = 0.15,
local_consistency_threshold = 0.6,
local_confidence_threshold = 0.2
)
# S3 method for class 'calculateGraphIntegrationObject'
plot(
x,
plot_type = c("community_network", "cell_network", "community_data", "summary",
"local_issues", "annotation_issues"),
color_by = c("cell_type", "community_type"),
max_nodes = 2000,
point_size = 0.8,
exclude_reference_only = FALSE,
...
)
A SingleCellExperiment
object containing numeric expression matrix for the query cells.
A SingleCellExperiment
object containing numeric expression matrix for the reference cells.
A character string specifying the column name in the query dataset containing cell type annotations.
A character string specifying the column name in the reference dataset containing cell type annotations.
A character vector specifying the cell types to include in the analysis. If NULL, all cell types are included.
A vector specifying the subset of principal components to use in the analysis. Default is 1:10.
An integer specifying the number of nearest neighbors for graph construction. Default is 30.
Name of the assay on which to perform computations. Default is "logcounts".
Resolution parameter for Leiden clustering. Default is 0.15 for fewer, larger communities.
Minimum number of cells required for a community to be analyzed. Default is 10.
Minimum number of cells required per cell type for inclusion. Default is 20.
Minimum proportion of query cells to consider a community "query-only". Default is 0.9.
Minimum proportion needed to flag cross-cell-type mixing. Default is 0.1.
Minimum proportion of reference neighbors that should support a query cell's annotation. Default is 0.6.
Minimum confidence difference needed to suggest re-annotation. Default is 0.2.
An object of class calculateGraphIntegrationObject
containing the diagnostic results.
Character string specifying visualization type. Options: "community_network" (default), "cell_network", "community_data", "summary", "local_issues", or "annotation_issues".
Character string specifying the variable to use for coloring points/elements if `plot_type` is "community_network" or "cell_network". Default is "cell_type".
Maximum number of nodes to display for performance. Default is 2000.
Point size for graph nodes. Default is 0.8.
Logical indicating whether to exclude reference-only communities/cells from visualization. Default is FALSE.
Additional arguments passed to ggplot2 functions.
A list containing:
Analysis of communities with only query cells
Analysis of communities with true query-reference cross-cell-type mixing
Local neighborhood-based annotation inconsistencies
Summary of local inconsistencies by cell type
Detailed composition of each community
Summary of annotation consistency issues
Overall diagnostic metrics
Graph structure information for plotting
Analysis parameters used
The list is assigned the class "calculateGraphIntegration"
.
A ggplot
object showing integration diagnostics.
The function performs three types of analysis: (1) Communities containing only query cells, (2) Communities where query cells are mixed with reference cells of different cell types WITHOUT any reference cells of the same type, and (3) Local analysis of each query cell's immediate neighbors to detect annotation inconsistencies even within mixed communities.
The S3 plot method creates optimized visualizations showing different types of annotation issues including community-level and local neighborhood-level inconsistencies.
calculateGraphIntegration
# Load data
data("reference_data")
data("query_data")
# Remove a cell type (Myeloid)
library(scater)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
#> Loading required package: scuttle
#> Loading required package: ggplot2
library(SingleR)
reference_data <- reference_data[, reference_data$expert_annotation != "Myeloid"]
reference_data <- runPCA(reference_data, ncomponents = 50)
SingleR_annotation <- SingleR(query_data, reference_data,
labels = reference_data$expert_annotation)
query_data$SingleR_annotation <- SingleR_annotation$labels
# Check annotation data
table(Expert = query_data$expert_annotation, SingleR = query_data$SingleR_annotation)
#> SingleR
#> Expert B_and_plasma CD4 CD8
#> B_and_plasma 98 3 1
#> CD4 0 133 5
#> CD8 0 44 186
#> Myeloid 21 12 0
# Run comprehensive annotation consistency diagnostics
graph_diagnostics <- calculateGraphIntegration(
query_data = query_data,
reference_data = reference_data,
query_cell_type_col = "SingleR_annotation",
ref_cell_type_col = "expert_annotation",
pc_subset = 1:10,
k_neighbors = 30,
resolution = 0.1,
high_query_prop_threshold = 0.9,
cross_type_threshold = 0.15,
local_consistency_threshold = 0.6,
local_confidence_threshold = 0.2
)
# Look at main output
graph_diagnostics$overall_metrics
#> $total_communities
#> [1] 28
#>
#> $high_query_prop_communities
#> [1] 1
#>
#> $true_cross_type_communities
#> [1] 2
#>
#> $total_high_query_prop_cells
#> [1] 34
#>
#> $total_true_cross_mixing_cells
#> [1] 40
#>
#> $total_locally_inconsistent_cells
#> [1] 47
#>
#> $modularity
#> [1] 0.5858466
#>
#> $mean_query_isolation_rate
#> [1] 0.08245798
#>
#> $mean_true_cross_mixing_rate
#> [1] 0.09500467
#>
#> $mean_local_inconsistency_rate
#> [1] 0.08798728
#>
# Network graph showing all issue types (color by cell type)
plot(graph_diagnostics, plot_type = "community_network", color_by = "cell_type")
# Network graph showing all issue types
plot(graph_diagnostics, plot_type = "cell_network",
max_nodes = 2000, color_by = "community_type")
# Network graph showing all issue types
plot(graph_diagnostics, plot_type = "community_data")
# Summary bar chart of all issues by cell type
plot(graph_diagnostics, plot_type = "summary")
# Focus on local annotation inconsistencies
plot(graph_diagnostics, plot_type = "local_issues")
# Overall annotation issues overview
plot(graph_diagnostics, plot_type = "annotation_issues")