vignettes/QCandAnnotationScores.Rmd
QCandAnnotationScores.Rmd
In this vignette, we will demonstrate how to use functions for visualizing various aspects of single-cell RNA sequencing data. We will cover:
We will use two datasets in this vignette: query_data
,
and qc_data.
Each dataset has been preprocessed to include
log-normalized counts, specific metadata columns, PCA, t-SNE, and UMAP
results.
The query_data
is another dataset from HeOrganAtlas, but
it serves as a query set for comparison. It contains:
SingleR_annotation
and
annotation_scores.
The qc_data
originates from the Bunis et al. study
focusing on haematopoietic stem and progenitor cells. This dataset
includes:
# Load library
library(scDiagnostics)
# Load datasets
data("qc_data")
data("query_data")
# Set seed for reproducibility
set.seed(0)
The scatter plot visualizes the relationship between QC statistics (e.g., total library size or percentage of mitochondrial genes) and cell type annotation scores. This plot helps in understanding how QC metrics influence or correlate with the predicted cell types.
# Generate scatter plot
p1 <- plotQCvsAnnotation(se_object = qc_data,
cell_type_col = "SingleR_annotation",
qc_col = "total",
score_col = "annotation_scores")
p1 + ggplot2::xlab("Library Size")
A scatter plot can reveal patterns such as whether cells with higher library sizes or mitochondrial content tend to be associated with specific annotations. For instance, cells with unusually high mitochondrial content might be identified as low-quality or stressed, potentially affecting their annotations.
Histograms provide a distribution view of QC metrics and annotation scores. They help in evaluating the range, central tendency, and spread of these variables across cells.
# Generate histograms
histQCvsAnnotation(se_object = query_data,
cell_type_col = "SingleR_annotation",
qc_col = "percent_mito",
score_col = "annotation_scores")
Histograms are useful for assessing the overall distribution of QC metrics and annotation scores. For example, if the majority of cells have high percent_mito, it might indicate that many cells are stressed or dying, which could impact the quality of the data.
Dimensional reduction plots (PCA, t-SNE, UMAP) are used to visualize the relationships between cells in reduced dimensions. Overlaying gene set scores on these plots provides insights into how specific gene activities are distributed across cell clusters.
# Plot gene set scores on PCA
plotGeneSetScores(se_object = query_data,
method = "PCA",
score_col = "gene_set_scores",
pc_subset = 1:3)
By visualizing gene set scores on PCA or UMAP plots, one can identify clusters of cells with high or low gene set activities. This can help in understanding the biological relevance of different gene sets or pathways in various cell states or types.
This vignette illustrates how to visualize and interpret QC statistics, cell type annotation scores, and gene set scores using single-cell RNA sequencing data. These visualizations are crucial for assessing data quality, understanding cell type annotations, and exploring gene activities, ultimately aiding in the comprehensive analysis of single-cell datasets.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scDiagnostics_1.0.0 BiocStyle_2.32.1
loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.34.0 gtable_0.3.5
[3] xfun_0.47 bslib_0.8.0
[5] ggplot2_3.5.1 htmlwidgets_1.6.4
[7] Biobase_2.64.0 lattice_0.22-6
[9] vctrs_0.6.5 tools_4.4.1
[11] generics_0.1.3 stats4_4.4.1
[13] tibble_3.2.1 fansi_1.0.6
[15] pkgconfig_2.0.3 Matrix_1.7-0
[17] desc_1.4.3 S4Vectors_0.42.1
[19] lifecycle_1.0.4 GenomeInfoDbData_1.2.12
[21] farver_2.1.2 compiler_4.4.1
[23] textshaping_0.4.0 munsell_0.5.1
[25] GenomeInfoDb_1.40.1 htmltools_0.5.8.1
[27] sass_0.4.9 yaml_2.3.10
[29] pkgdown_2.1.1 pillar_1.9.0
[31] crayon_1.5.3 jquerylib_0.1.4
[33] SingleCellExperiment_1.26.0 DelayedArray_0.30.1
[35] cachem_1.1.0 abind_1.4-8
[37] tidyselect_1.2.1 digest_0.6.37
[39] dplyr_1.1.4 bookdown_0.40
[41] labeling_0.4.3 fastmap_1.2.0
[43] grid_4.4.1 colorspace_2.1-1
[45] cli_3.6.3 SparseArray_1.4.8
[47] magrittr_2.0.3 S4Arrays_1.4.1
[49] utf8_1.2.4 withr_3.0.1
[51] UCSC.utils_1.0.0 scales_1.3.0
[53] rmarkdown_2.28 XVector_0.44.0
[55] httr_1.4.7 matrixStats_1.4.1
[57] ragg_1.3.3 evaluate_1.0.0
[59] knitr_1.48 GenomicRanges_1.56.1
[61] IRanges_2.38.1 rlang_1.1.4
[63] glue_1.7.0 BiocManager_1.30.25
[65] BiocGenerics_0.50.0 jsonlite_1.8.9
[67] R6_2.5.1 MatrixGenerics_1.16.0
[69] systemfonts_1.1.0 fs_1.6.4
[71] zlibbioc_1.50.0