Setup & Installation

Overview

This page provides instructions for setting up the computational environment needed to run the preprocessing pipelines and analysis tutorials. The workflow requires both R and Python packages.

Quick Start

If you’re primarily interested in running the tutorials on pre-processed data (fast), you only need the R dependencies. If you also want to preprocess the data from scratch (slow), you will need to complete the full R + Python setup.

Part 1: R Packages

All R dependencies can be installed with a single setup script. The package requirements are largely identical for both COVID-19 and MERFISH analyses.

Installation Script

Run this R script once to install all required packages:

Code
# For COVID-19 scRNA-seq analysis
source("R/covid/R_Package_Installation_Pipeline.R")

Or equivalently for MERFISH colitis analysis:

Code
# For MERFISH spatial transcriptomics analysis
source("R/merfish/R_Package_Installation_Pipeline.R")

Note: Both scripts install the same core packages with identical versions. Running either one will set you up for both analyses.

What Gets Installed

From CRAN (15 packages):

  • Data manipulation: dplyr, tidyr, tibble
  • Matrix operations: Matrix
  • Visualization: ggplot2, cowplot, patchwork, GGally, ggridges, pheatmap, circlize, viridis
  • Utilities: reticulate, remotes, here

From Bioconductor (14 packages):

  • Infrastructure: SingleCellExperiment, SpatialExperiment, HDF5Array, DelayedArray, BiocParallel, BiocSingular
  • Analysis: scran, scater, SingleR
  • Visualization: ComplexHeatmap
  • Utilities: biomaRt, zellkonverter, MerfishData
  • Diagnostics: scDiagnostics (devel v1.5.1)

From GitHub (2 packages):

  • Seurat (v5.3.1.0 - specific version for Azimuth compatibility)
  • Azimuth (reference-based annotation)

Checking Installation

After running the script, verify everything is installed correctly:

Code
# Test loading key packages
library(SingleCellExperiment)
library(scran)
library(Azimuth)
library(scDiagnostics)

cat("✓ All R packages installed and loaded successfully!\n")

Part 2: Python Environment (Optional)

Only needed if you run CellTypist or scVI/scArches annotation from scratch.

Option A: scVI/scArches via Conda (GPU-Accelerated)

This is the recommended approach for the complete preprocessing pipeline.

Important: scVI/scArches is computationally intensive and is designed to run on GPU hardware. We performed all analyses on GPU nodes (NVIDIA L40S GPUs on HMS O2 cluster). CPU-only execution will be significantly slower.

Step 1: Create the conda environment

conda env create -f environment-scvi.yml

Step 2: Activate it when running Python scripts

conda activate scvi-env

The environment-scvi.yml file is in the repository root and includes:

  • Python 3.9
  • scvi-tools and scarches (GPU-enabled annotation tools)
  • scanpy (data analysis)
  • JAX (with GPU support)
  • CUDA Toolkit integration

GPU Access: Ensure your compute environment has GPU access. Example SLURM submission on O2:

sbatch --gres=gpu:l40s:1 --mem=40G your_script.py

Option B: CellTypist Setup

For using CellTypist separately (CPU-compatible), run the Python environment setup script:

Code
source("R/auxiliary/environmentSetupCellTypist.R")
environmentSetupCellTypist()

The environmentSetupCellTypist() function:

  • detects your Python installation
  • creates a celltypist_env conda environment
  • installs required packages: scanpy, pandas, numpy, celltypist
  • tests imports and provides manual installation instructions if needed

Manual Python Installation

If you prefer manual setup:

# For scVI/scArches (GPU-enabled)
conda create -n scvi-env python=3.9
conda activate scvi-env
pip install scvi-tools scarches scanpy leidenalg scikit-learn jax jaxlib
# For CellTypist (CPU-compatible)
conda create -n celltypist_env python=3.9
conda activate celltypist_env
conda install -c conda-forge scanpy pandas numpy
pip install celltypist

System Requirements

  • R: Version 4.2 or later
  • Python: Version 3.9 (recommended for compatibility)
  • Memory: 16+ GB RAM for R, 40+ GB RAM for GPU-enabled scVI/scArches
  • GPU: NVIDIA GPU with CUDA 11.8+ (required for scVI/scArches)
  • Disk: ~50 GB for raw data + processed outputs

Troubleshooting

R package installation failure:

  • Ensure you have C++ compilers installed (Rtools on Windows, build-essential on Linux, Xcode on Mac)
  • Update R to the latest release version
  • Try installing packages individually with BiocManager::install("package_name")

Python environment issues:

  • Ensure conda is installed: conda --version
  • Clear conda cache: conda clean --all
  • Reinstall environment: conda env remove -n scvi-env && conda env create -f environment-scvi.yml
  • Verify GPU access: nvidia-smi (if running on HPC cluster, check with module list)

reticulate can’t find Python:

In R, explicitly set the Python path:

Code
library(reticulate)
use_condaenv("scvi-env")  # or "celltypist_env"