Code
library(MerfishData)
spe <- MouseColonIbdCadinu2024()This page describes the data processing workflow for the MERFISH mouse colitis dataset. The raw data is obtained through the Bioconductor MerfishData package and undergoes filtering, quality control, and normalization to generate the fully-processed analysis-ready reference (healthy) and query (DSS-induced colitis) datasets.
The raw dataset is available through MerfishData R/Bioconductor package:
Package: MerfishData
Dataset: MouseColonIbdCadinu2024() — Mouse colon tissue profiled with MERFISH technology
Citation: Cadinu et al. (2024). Charting the cellular biogeography in colitis reveals fibroblast trajectories and coordinated spatial remodeling. Cell, 187(8), 2010-28.
Genes: 943 genes profiled across healthy and DSS-treated samples
The workflow is executed through a single R script:
File: R/merfish/Data_QC_and_Processing.R
Load the MERFISH dataset from MerfishData:
Then filter for:
Apply adaptive QC thresholds to remove low-quality cells:
Reference dataset only: Remove any cells labeled as inflammation-associated from the healthy reference using the addMergedCellTypes() function including:
as those were considered artifacts from the manual-marker based annotation performed by the authors on the overall clustering of cells across timepoints.
Re-calculate log-normalized counts (logcounts) on the quality-filtered data using library size factors.
Uses the addReferencePCA() function to:
This ensures comparability between the two conditions while preserving signals unique to inflammation.
Uses the addMergedCellTypes() function with MERFISH-specific mapping rules to collapse fine-grained tissue annotations (tier2) into broader, unified cell type categories (tier2_merged). Enables consistent interpretation across analyses.
Cell type groups include: Fibroblast, Inflamed Fibroblast, Smooth Muscle, Inflamed SMC, Epithelial, Inflamed Epithelial, Stem/TA, Immune lineages (Neutrophil, Macrophage, etc.), Endothelial, and others.
Both output objects are in SpatialExperiment format with:
Assays: counts (raw) and logcounts (log-normalized)
Spatial data: X/Y coordinates retained
Metadata: Cell-level annotations including:
Reduced dimensions: PCA (50 components) computed on selected HVGs
Dimensions:
Rather than running this pipeline from scratch (slow), fully processed datasets can be obtained directly following the instructions on the Data retrieval page (fast).
For complete implementation details, see:
R/merfish/Data_QC_and_Processing.RSupporting functions:
R/auxiliary/addReferencePCA.R — Diagnostic-aware PCA using union of reference and query HVGsR/auxiliary/addMergedCellTypes.R — Cell type categorization with MERFISH-specific mappings