Accessing Data

Quick Start: Download Data

All processed datasets with precomputed cell type annotations are available on Zenodo.

Manual Download

Visit the Zenodo repository and download:

  • covid_data_sce.rds — COVID-19 PBMC query (severe infection)
  • normal_data_sce.rds — Healthy PBMC reference
  • dss9_data.rds — MERFISH colitis query (day 9)
  • healthy_data.rds — Healthy colon MERFISH reference

Place files under:

data/
├── covid/
│   ├── covid_data_sce.rds
│   └── normal_data_sce.rds
└── merfish/
├── dss9_data.rds
└── healthy_data.rds

Data Contents

All objects are in SingleCellExperiment or SpatialExperiment format with the following components:

  • Expression data in assays() (counts and logcounts)
  • Cell metadata in colData() including:
    • Ground truth cell type annotations (tier2_merged)
    • Azimuth predictions (azimuth_celltype_l1_merged)
    • SingleR predictions (singler_annotations_merged)
    • CellTypist predictions (celltypist_predicted_labels_merged)
    • scArches predictions (scvi_prediction_merged)
    • Method-specific confidence/uncertainty scores
  • Dimensionality reductions in reducedDims() (PCA, UMAP)
  • Spatial coordinates in spatialCoords() (MERFISH only)

Loading Data

Once the data has been obtained, read into R with:

Code
library(SingleCellExperiment)

# COVID-19 data
covid_query <- readRDS("data/covid/covid_data_sce.rds")
covid_ref <- readRDS("data/covid/normal_data_sce.rds")

# MERFISH data
merfish_query <- readRDS("data/merfish/dss9_data.rds")
merfish_ref <- readRDS("data/merfish/healthy_data.rds")

# Inspect annotations
head(colData(covid_query))

Dataset Details

COVID-19 PBMC

MERFISH Mouse Colitis

  • Query: 29,040 cells from DSS-treated mouse colon (day 9, peak inflammation)
  • Reference: 27,140 cells from healthy mouse colon (day 0)
  • Total genes: 943 (MERFISH gene panel)
  • Source: MerfishData package (Geistlinger et al., 2024); original data from Cadinu et al., Cell 2024
  • Technology: MERFISH imaging-based spatial transcriptomics

Citation

If using these datasets, please cite:

COVID-19 data:

Stephenson, E., et al. (2021). Single-cell multi-omics analysis of the immune response in COVID-19. Nature Medicine, 27, 904–16.

MERFISH data:

Cadinu, P., et al. (2024). Charting the cellular biogeography in colitis reveals fibroblast trajectories and coordinated spatial remodeling. Cell, 187(8), 2010-28.

Geistlinger, L., Moffitt, J., & Gentleman, R. (2024). MerfishData: Collection of public MERFISH datasets. R package version 1.8.0. Bioconductor

scDiagnostics package:

See main citation on home page