Skip to content

Latest commit

 

History

History
234 lines (167 loc) · 9.07 KB

File metadata and controls

234 lines (167 loc) · 9.07 KB

PyViscel

Interactive visualization and analysis of single-cell transcriptomics data — Python port of the VisCello R/Bioconductor package.

No R runtime required. Native AnnData/h5ad format throughout.


Installation

Option A — pip from GitHub (recommended)

pip install git+https://github.com/Gartner-Lab/pyviscel.git

To upgrade to the latest version later:

pip install --upgrade git+https://github.com/Gartner-Lab/pyviscel.git

Option B — conda environment (recommended for new machines)

Some dependencies (leidenalg, igraph, umap-learn) can be tricky to build from source. Using conda avoids compiler issues:

conda create -n pyviscel python=3.12
conda activate pyviscel
conda install -c conda-forge leidenalg python-igraph umap-learn
pip install git+https://github.com/Gartner-Lab/pyviscel.git

Option C — development install (for contributors)

git clone https://github.com/Gartner-Lab/pyviscel.git
cd pyviscel
pip install -e ".[dev]"

Runtime dependencies (auto-installed): anndata, dash, dash-bootstrap-components, plotly, pandas, numpy, scipy, scikit-learn, umap-learn, openTSNE, leidenalg, igraph, statsmodels, gseapy, matplotlib, seaborn.


Quick Start

Step 1 — Get your data into h5ad

Option A — Convert from an existing VisCello R object:

# In R — requires the original VisCello R package
library(VisCello)
cc <- readRDS("my_cello.rds")
viscello_to_h5ad(cc, "my_data.h5ad")

Option B — Build an AnnData directly in Python:

from pyviscel import validate_adata, save_adata
validate_adata(adata)          # checks required slots
save_adata(adata, "my_data.h5ad")

Step 2 — Explore programmatically

from pyviscel import load_adata, list_cellos, list_projections

adata = load_adata("my_data.h5ad")
print(list_cellos(adata))                       # ['All Cells', 'T cells', ...]
print(list_projections(adata, "All Cells"))     # ['UMAP_2D', 'UMAP_3D', ...]

Step 3 — Load data (optional: backed/memory-mapped mode)

For very large datasets (100k+ cells), open the file in backed mode to keep expression matrices on disk and reduce RAM usage:

from pyviscel import load_adata
adata = load_adata("my_data.h5ad", backed="r")   # read-only memory map

The file must already contain a norm_exprs layer — automatic layer aliasing is skipped in backed mode.

Step 4 — Launch the interactive web app

from pyviscel import run_app
run_app("my_data.h5ad", host="127.0.0.1", port=8050)

Or from the terminal:

pyviscel my_data.h5ad
pyviscel my_data.h5ad --host 0.0.0.0 --port 8050   # accessible on local network
pyviscel my_data.h5ad --no-validate                 # skip schema check for external h5ad files

Then open http://127.0.0.1:8050 in your browser.


Web App Features

Explorer Tab

Control Description
Cello dropdown Select a named cell subset
Projection dropdown Select a 2-D or 3-D embedding (PCA, t-SNE, UMAP)
Color By dropdown Color cells by metadata column, Manual_Selection, or gene expression
Point size / Alpha Adjust marker size and transparency
Legend Toggle full / abbreviated / no legend
Download view Save the current scatter as a PNG image

Large cellos (many cells) are automatically spatially downsampled before rendering using a grid-based algorithm that preserves cluster structure; all cells are retained in the data.

3-D camera controls: When a 3-D projection is selected, elevation, azimuth, and zoom sliders appear alongside the scatter. The current angle is shown in the readout beneath the sliders. Dragging the scatter directly also updates the sliders.

Cell Annotation (manual selection)

2-D projections:

  1. Switch the main scatter to lasso/box tool (toolbar icon)
  2. Draw a selection on the plot — the status bar shows the cell count
  3. Click Confirm — the selection is saved as Group 1, Group 2, etc. in a new Manual_Selection column
  4. Repeat for more groups
  5. Select Manual_Selection in Color By to see all groups

3-D projections:

  1. Rotate the 3-D scatter to any viewing angle (or use the elevation/azimuth/zoom sliders)
  2. Click Snapshot Current View — a 2-D projection of that camera angle appears below
  3. Draw a lasso on the 2-D projection — the cell count updates
  4. Click Confirm — same Group 1/2/3 workflow as 2-D
  5. Click Clear to reset the projection panel

Cell Composition Tab

A sub-tab within the Explorer for cross-tabulating two metadata columns.

  • Select a Row variable and Column variable from adata.obs
  • Numeric columns are binned automatically (configurable bin count)
  • Results appear as an annotated heatmap (raw counts or row/column/total-normalised proportions) and a sortable data table
  • Download CSV exports the cross-tabulation matrix

Differential Expression Tab

Compares a selected group of cells against a background using:

  • Chi-square — fast, good for detecting marker genes
  • Mann-Whitney U — non-parametric, robust
  • sSeq — negative-binomial model (closest to edgeR/DESeq2 behaviour)

The DE panel shows:

  • Group 1 DEGs — genes upregulated in the selected group (log2FC > 0)
  • Group 2 DEGs — genes upregulated in the background group (log2FC < 0, displayed as positive fold-change)
  • A scatter plot coloured by group membership on the selected projection
  • A gene expression scatter for any gene you search in the DE results
  • A heatmap of the top significant genes

Results are sortable and downloadable as CSV.

Enrichment Tab

Full ORA and GSEA Prerank suite powered by the Enrichr API (gseapy).

Mode

  • ORA (Over-Representation Analysis) — tests which gene sets are enriched in the DE gene lists for Group 1 and Group 2 simultaneously; results shown side-by-side as dotplots and sortable tables.
  • GSEA Prerank — ranks all genes by signed log₂FC (Group 1 positive, Group 2 negative), runs GSEA Prerank on the full ranked list; mountain plots shown for top enriched terms.

Organisms supported Human (hsa), Mouse (mmu), Fly (dme), Zebrafish (dre), Yeast (sce), Worm (cel).

Gene set types

Type Description Availability
BP GO Biological Process All organisms
MF GO Molecular Function All organisms
CC GO Cellular Component All organisms
All GO BP + MF + CC combined All organisms
KEGG KEGG Pathways All organisms
WikiPathways WikiPathways All organisms
MSigDB Hallmark MSigDB Hallmark gene sets Human & Mouse only
Reactome Reactome Pathways Human & Mouse only
All All of the above Human & Mouse only

Mouse MSigDB/Reactome results are obtained by first converting mouse gene symbols to human orthologs via the mygene.info API, then querying Enrichr — no local file required.

Controls

  • Fast mode checkbox — runs GSEA with 100 permutations instead of 1000 for quick exploration
  • Run Enrichment / Run GSEA — results and any error messages appear immediately below
  • Download CSV — exports ORA results (both groups) or GSEA results as .csv

Modules

Module Description
pyviscel.io Load/save .h5ad, validate schema, list cellos and projections
pyviscel.cello_class Cello and CelloCollection — named cell subsets with projections
pyviscel.dim_reduction PCA, t-SNE, UMAP (stored in adata.obsm)
pyviscel.clustering k-NN graph construction, Leiden / Louvain / density clustering
pyviscel.differential_expression Chi-square, Mann-Whitney U, sSeq NB DE tests
pyviscel.enrichment ORA and GSEA Prerank via gseapy (Enrichr); mouse→human ortholog conversion via mygene.info
pyviscel.plotting Plotly scatter plots, expression plots, enrichment dotplots, GSEA mountain plots, crosstab heatmaps
pyviscel.heatmap Annotated gene expression heatmap (log → z-score → cluster)
pyviscel.ui_components Reusable Dash/DBC layout components
pyviscel.app Full Dash application with Explorer, Annotation, and DE tabs
pyviscel.convert Utilities for converting R VisCello objects to AnnData

Development

pip install -e ".[dev]"
pytest tests/ -q          # 412 tests

Tests cover all analysis modules and the Dash app layout/callbacks.


Known Limitations / Upcoming Work

  • 3-D camera-angle selection: snapshot projection works; minor visual edge cases remain (being fixed)
  • dash_table.DataTable deprecation warning from Dash — no functional impact; migration to dash-ag-grid planned
  • Enrichr ORA uses the built-in Enrichr background, not a custom gene universe (Enrichr API limitation); use compute_go_offline() directly for custom-background ORA
  • GSEA Prerank with 1000 permutations can take several minutes; use Fast mode (100 permutations) for exploratory work
  • Mouse MSigDB/Reactome requires internet access (mygene.info ortholog lookup); offline runs will fail for these two types