CLAUDE.md — Developer Guide for AI Assistants

This file is for Claude (and future AI assistants) working in this repository. Read this before making any changes.

Project Overview

PyViscel is a Python port of the VisCello R/Bioconductor single-cell explorer. It provides a Dash web application for interactive visualization and annotation of single-cell transcriptomics data stored in AnnData/h5ad format.

Stack: Python 3.12, Dash 4.x, Plotly 6.x, AnnData, pandas 2.x, numpy, scanpy.

Architecture

src/pyviscel/
├── app.py               # Dash app factory + all callbacks (largest file)
├── ui_components.py     # Layout builders (no callbacks, pure HTML/Dash components)
├── plotting.py          # Plotly figure builders (no Dash, pure numpy/plotly)
├── cello_class.py       # Cello / CelloCollection data model
├── io.py                # load_adata / save_adata / validate_adata
├── dim_reduction.py     # PCA / tSNE / UMAP wrappers
├── clustering.py        # Leiden / Louvain / density clustering
├── differential_expression.py  # Chi-sq / MWU / sSeq DE
├── enrichment.py        # GO/KEGG via gseapy
├── heatmap.py           # Annotated heatmap
└── convert/
    └── from_r.py        # R VisCello → AnnData conversion

The app is structured as a factory function create_app(adata) inside app.py. All callbacks are registered inside that function (closure pattern) so they share access to the mutable adata object via _get_adata().

Key Dash / Plotly Gotchas (hard-won knowledge)

Plotly 6.x breaking changes

hoverinfo="skip" suppresses selectedData — points with hoverinfo="skip" are NOT included in the selectedData event in Plotly 6+. Use hoverinfo="none" instead.
This applies to every trace that the user must be able to lasso-select.

go.Scattergl vs go.Scatter

go.Scattergl does not reliably surface customdata in selectedData. Always use go.Scatter for any trace where lasso selection must extract customdata. (Scattergl is fine for display-only traces where selection is not needed.)

customdata extraction from selectedData

Points in selectedData["points"] may have customdata as a scalar or [scalar]. Always unwrap lists before casting to int:

cd = pt.get("customdata")
while isinstance(cd, (list, tuple)):
    if not cd: cd = None; break
    cd = cd[0]
if cd is not None:
    idx = int(cd)

Also provide a pointNumber fallback for traces where customdata may be absent.

Dynamic trace structure breaks lasso indices

If you remove cells from a trace on re-render, pointNumber indices shift. A preserved Plotly lasso selection will then map to wrong global cell indices. Fix: background traces must always contain ALL cells (static structure). Use selectionrevision (not uirevision) to clear the lasso highlight after each selection event without resetting zoom/pan.

uirevision vs selectionrevision

uirevision — preserves zoom, pan, camera angle. Change it to reset the view.
selectionrevision — controls selection highlight only. Change it to clear the lasso without resetting the view. Increment it after each selection event.

Dash 4.x

Multiple callbacks writing the same Output require allow_duplicate=True.
prevent_initial_call=True is required on most callbacks to avoid firing at page load.

pandas 2.x

Categorical fillna: series.fillna("NA") raises TypeError on Categorical dtypes. Use: series.astype(object).fillna("NA").astype(str)
Safe in-place assignment: adata.obs.loc[bool_mask, col] = value (CoW-safe).

Cell Selection Pipeline

2-D projections

User lasso on vc-scatter
  → handle_cell_selection (selectedData)
  → vc-store-cells (list of global cell indices)
  → vc-cell-count (display label)
  → User clicks Confirm
  → confirm_selection
  → adata.obs["Manual_Selection"] = Group 1 / 2 / 3 ...
  → vc-store-group-counter incremented
  → vc-color-dd options updated (Manual_Selection appears)

3-D projections (camera-angle projection)

User rotates vc-scatter (3D)
  → track_3d_camera (relayoutData → vc-3d-camera)
  → User clicks "Snapshot Current View"
  → render_3d_proj_view
      reads vc-3d-camera + adata.obsm[proj_key]
      _project_3d_to_camera(xyz, camera) → px, py
      renders vc-3d-proj-view (go.Scatter, customdata=ci_array, dragmode=lasso)
  → User lassos on vc-3d-proj-view
  → handle_3d_proj_selection (selectedData)
  → vc-store-cells (same as 2-D from here)
  → User clicks Confirm → same confirm_selection callback

_project_3d_to_camera math

Compute forward vector: fwd = (center - eye) / |center - eye|
Right axis: right = cross(fwd, up) / |cross(fwd, up)|
Up-ortho axis: up_ortho = cross(right, fwd)
Normalize point cloud to [-1,1]³
px = xyz_norm @ right, py = xyz_norm @ up_ortho

Important Store IDs

Store ID	Type	Purpose
`vc-store-cells`	memory	Current lasso selection (list of global cell indices)
`vc-store-group-counter`	memory	Next group number for Manual_Selection
`vc-3d-camera`	memory	Last known Plotly camera dict for 3D scatter
`vc-store-sel-history`	memory	Legacy — kept in layout but no longer used by callbacks

UI Component IDs (selection-related)

ID	Component	Purpose
`vc-scatter`	dcc.Graph	Main scatter plot (2D or 3D)
`vc-cell-count`	html.Small	Displays "N cells selected"
`vc-confirm-annotation-btn`	dbc.Button	Saves current selection as next Group
`vc-3d-sel-panel`	html.Div	Hidden for 2D; shown for 3D
`vc-3d-snapshot-btn`	dbc.Button	Takes camera-angle snapshot
`vc-3d-proj-view`	dcc.Graph	2D projection canvas (lasso here)
`vc-3d-proj-status`	html.Small	Status/instruction text
`vc-3d-proj-clear-btn`	dbc.Button	Clears projection + resets store-cells

Testing

pytest tests/ -q          # should be 390 passed

Tests are in tests/ — one file per module.
test_app.py tests layout structure and callback helpers via direct function calls.
Never mock the AnnData object — tests build real small AnnData fixtures.
All gseapy API calls in test_enrichment.py are mocked with unittest.mock.patch.
All mygene.info API calls in test_enrichment.py are mocked with unittest.mock.patch("requests.post", ...).
All 412 tests must pass before committing.

What Was Changed (session history)

Bug fixes applied in sessions up to 2026-03-19

app.py

confirm_selection: Changed except Exception: raise PreventUpdate → always increment group_counter even when scatter re-render fails. This was the root cause of Manual_Selection never appearing in the Color By dropdown.
confirm_selection: Removed vc-store-sel-history State (no longer needed).
handle_cell_selection: Added pointNumber fallback + robust customdata unwrapping for both scalar and [scalar] formats.
_get_projection_options: Fixed fallback to guard with cello_name in adata.uns.get("cellos", {}) before iterating obsm.
Replaced entire 3D multi-view system (3×axis dropdowns, 3×view renders, 3×view handlers, apply/clear/summarize) with camera-angle projection approach:
- _project_3d_to_camera helper
- track_3d_camera callback
- render_3d_proj_view callback (Snapshot button)
- handle_3d_proj_selection callback (lasso → vc-store-cells)
- clear_3d_proj_selection callback (Clear button)
Added dcc.Store(id="vc-3d-camera") to layout.

plotting.py

expression_scatter: Changed all 4 go.Scattergl → go.Scatter; added ci_array = np.array(cell_indices) and customdata=ci_array[mask] to every trace so lasso selection works on gene expression views.
All traces: changed hoverinfo="skip" → hoverinfo="none" (Plotly 6 fix).
scatter_plot cover0 background trace: same hoverinfo fix.
Fixed fillna("NA") for Categorical columns (pandas 2.x CoW).

ui_components.py

Removed _sel_view_block (3-panel multi-view layout).
Added _camera_sel_panel() with new IDs: vc-3d-snapshot-btn, vc-3d-proj-clear-btn, vc-3d-proj-status, vc-3d-proj-view.
vc-3d-sel-panel now renders _camera_sel_panel() instead of the old 3-grid layout.

Bug fixes applied in session 2026-03-20

app.py — track_3d_camera

Fixed 3-D camera tracking: Plotly 6 sends rotation events as {"scene.camera": {"eye": …, "up": …, "center": …}} (nested dict under one key), NOT as flat "scene.camera.eye" / "scene.camera.up" / "scene.camera.center" keys. The old code looked for the flat keys → always got empty dict → always raised PreventUpdate → vc-3d-camera store never updated → Snapshot always used default angle. Fix: check "scene.camera" first (nested form), then fall back to flat keys.

plotting.py — scatter_plot_3d (line ~676)

Fixed pandas Categorical fillna crash in the 3-D colour path. The 2-D path (scatter_plot) already used the safe pattern; the 3-D path did not. Changed values.fillna("NA") → values.astype(object).fillna("NA").astype(str). (Same fix as documented in the pandas 2.x gotcha above.)

Bug fixes applied in session 2026-03-20 (DE, enrichment, UI)

app.py

run_de (bidirectional DEGs): single run_de_test call, split by log2fc > 0 (Group 1) and log2fc < 0 (Group 2, fold-change negated). Both groups now appear in DE result tabs.
_de_df_to_records: fixed bool→float corruption from select_dtypes(include=[np.number]), which includes np.bool_. Now explicitly excludes bool columns before rounding. Without this fix, significant became 1.0/0.0 after JSON round-trip, breaking downstream df[df["significant"]] filters.
All df[df["significant"]] → df[df["significant"].astype(bool)] for safety.
update_palette_options / render_scatter: default colormap for gene expression changed from "rainbow2" to "viridis".
run_go_enrichment: added min_overlap=3 (was 5), diagnostic status message (gene count, organism, go_type), full traceback in error tabs. vc-go-status div shows errors and completion status. min_overlap lowered to avoid silently discarding valid results.
update_de_proj callback: added vc-de-proj-dd projection selector in DE panel.
render_de_scatter / render_de_gene_scatter: new callbacks for DE scatter + gene expression scatter using selected projection (supports 3D).
Added lazy gene search callbacks (search_gene_options, search_de_gene_options) — return ≤50 matches on keystroke instead of loading all var_names at start.

plotting.py

expression_scatter / expression_scatter_3d: default pal changed to "viridis".

ui_components.py

DE controls row: added vc-de-proj-dd projection dropdown (5-column layout).
results_panel: restructured from 2-col to 3-col — scatter | gene expression scatter | heatmap.
Added vc-de-gene-search dropdown and vc-de-gene-scatter graph (middle column).
Added vc-go-organism dropdown (default "hsa") in enrichment section.
GO_TYPES KEGG value corrected: "KEGG" → "kegg".
Added "all" option to GO_TYPES.
Added vc-go-status div for error/status display.

enrichment.py

Added _validate_gene_symbols() — rejects bool lists, "True"/"False" string lists, and auto-coerces pandas bool Series to gene names with a warning.
Added organisms: rno (rat), dme (fly), dre (zebrafish/fish), sce (yeast) to ENRICHR_LIBRARIES, _ENRICHR_ORGANISM, _VALID_ORGANISMS.
run_enrichment gets organism: str | None = None parameter that overrides adata.uns config.
Background warning in run_enrichment fires only when caller explicitly supplies background_symbols (not for the default adata.var_names).
_parse_enrichr_result: normalises Genes column from list or string → semicolon-separated string. gseapy sometimes returns a Python list instead of a string.

differential_expression.py

feature_name_column dtype check: if the configured column is not string/object dtype (e.g. a boolean "highly_variable" column), falls back to adata.var_names and clears the bad config entry from adata.uns so the warning fires only once per session.

tests/test_enrichment.py (30 new tests, 91 total)

TestValidateGeneSymbols — 8 tests: bool list, bool-string list, pandas bool Series, pandas Index, empty list.
TestParseEnrichrResultListGenes — 2 tests: list-type Genes column handling.
TestNewOrganisms — 12 tests: rno/dme/dre/sce registry + organism string + API routing.
TestRunEnrichment extended — 6 tests: organism= override for all new organisms.
Fixed test_background_ignored_warning to call run_enrichment (warning lives there, not in compute_go).

Changes applied in session 2026-03-23 (Heatmap UI + Full Enrichment Suite)

ui_components.py

Heatmap: replaced 10/20/50/100 button group with free-form number input (vc-de-top-n-input, default 50, min 1). Removed "⚠ Recommend ≤ 50 genes for performance" warning.
GO_ORGANISMS: removed rno (Enrichr does not support rat).
GO_TYPES: expanded to 9 options — BP, MF, CC, All GO (go_all), KEGG, WikiPathways (wiki), MSigDB Hallmark (msigdb), Reactome Pathways (reactome), All.
enrichment_section: full replacement — ORA/GSEA mode toggle (vc-enrich-mode), fast-mode checkbox (vc-enrich-fast-mode), side-by-side Group 1/Group 2 dotplot+table layout (vc-enrich-dotplot-g1/g2, vc-enrich-table-g1/g2), hidden GSEA panel (vc-enrich-gsea-results), library warning div (vc-go-lib-warning).

app.py

vc-hmap-topn-store default changed from 30 → 50.
store_top_n: rewritten to read from number input instead of 4 buttons.
download_heatmap: changed from write_html → write_image(format="png", scale=2) using kaleido.
Added stores: vc-store-enrich-g1, vc-store-enrich-g2, vc-store-gsea.
Removed old run_go_enrichment (single-group, tabs-based) and download_go_table (Excel) callbacks.
gate_library_options: new callback — hides MSigDB/Reactome for non-human/mouse organisms.
toggle_enrich_mode: new callback — shows ORA or GSEA results panel based on mode selector.
run_ora_enrichment: new callback — runs ORA for both DE groups simultaneously; produces side-by-side dotplots + tables; ID mismatch warning (<50% overlap).
run_gsea_enrichment: new callback — builds signed log2FC ranked list from DE results (g1 positive, g2 negated back to negative), runs run_gsea_prerank(), splits results by NES sign, renders mountain plots.
download_enrichment_csv: new callback — downloads ORA (both groups) or GSEA results as .csv.

enrichment.py — major overhaul

ENRICHR_LIBRARIES: fully replaced. GO 2025 for hsa/mmu, GO 2018 for dme/dre/sce/cel/rno. KEGG: KEGG_2019_Human / KEGG_2019_Mouse / KEGG_2019 (others). WikiPathways: WikiPathways_2024_Human / WikiPathways_2024_Mouse / WikiPathways_2018 (others). MSigDB (MSigDB_Hallmark_2020) and Reactome (Reactome_Pathways_2024) for hsa/mmu only.
_VALID_GO_TYPES: added wiki, msigdb, reactome, go_all.
_HUMAN_MOUSE_ONLY_TYPES = frozenset({"msigdb", "reactome"}): new constant — these types require mouse→human conversion for mmu.
mouse_to_human_online(): new function — replaces HMD file lookup with mygene.info API (2 batch POSTs, no key, ~0.4 s for 30 genes).
compute_go(): auto-triggers convert_mouse_to_human for mmu + msigdb/reactome; uses mouse_to_human_online() when hmd_path=None (no longer raises ValueError); added min_gene_set_size=10 / max_gene_set_size=500 filters; retry (3×, exponential backoff); low-overlap warning (<50%).
_parse_enrichr_result(): added gene_ratio = overlap_count / overlap_total column.
run_enrichment(): passes through min_gene_set_size / max_gene_set_size.
_compute_running_es(): new helper — weighted GSEA running enrichment score algorithm.
_parse_gsea_result() / _GSEA_COLS: new GSEA result normalizer.
run_gsea_prerank(): new function — wraps gseapy.prerank() with retry, returns {results, ranking, gene_sets} dict for downstream mountain plotting.

plotting.py

enrichment_dotplot(): new function — Plotly bubble chart (x = gene_ratio, size = overlap_count, color = pval_adj, top 10 terms).
gsea_mountain_plot(): new function — Plotly enrichment score curve with inline running ES computation, hit rug, peak marker, NES/FDR annotation.

tests/test_enrichment.py (22 new tests, 412 total)

Updated: test_hsa_bp_uses_2025_library, test_correct_library_for_hsa_kegg (→ KEGG_2019_Human), test_sce_kegg_library_defined (→ KEGG_2019), test_dme/dre_kegg_uses_kegg_2019, test_convert_mouse_to_human_uses_online_when_no_hmd_path.
Added: TestHumanMouseOnlyTypes (10 tests), TestMouseToHumanOnline (4 tests), TestRunGseaPrerank (5 tests), TestComputeRunningEs (3 tests).

System fix

Removed corrupt macOS metadata file /Volumes/Shared/Concord/src/._concord_sc.egg-info that was blocking all pip installs.
Installed kaleido (required for PNG heatmap export).

Enrichment Suite — Library Map (as of 2026-03-23)

go_type	Human (hsa)	Mouse (mmu)	Fly/Fish/Yeast/Worm
BP	GO_Biological_Process_2025	same	GO_Biological_Process_2018
MF	GO_Molecular_Function_2025	same	GO_Molecular_Function_2018
CC	GO_Cellular_Component_2025	same	GO_Cellular_Component_2018
kegg	KEGG_2019_Human	KEGG_2019_Mouse	KEGG_2019
wiki	WikiPathways_2024_Human	WikiPathways_2024_Mouse	WikiPathways_2018
msigdb	MSigDB_Hallmark_2020	same*	—
reactome	Reactome_Pathways_2024	same*	—

*Mouse msigdb/reactome: auto-converts mouse symbols → human orthologs via mygene.info before calling Enrichr.

GSEA Prerank Pipeline

DE results (vc-store-de)
  → g1 genes: use stored log2fc as-is  (positive)
  → g2 genes: negate stored log2fc back (negative, since app.py negates them on storage)
  → combine → pd.Series sorted descending
  → run_gsea_prerank(ranked_series, organism, go_type, permutations)
      → gseapy.get_library(lib)  ← downloads + caches gene set dict
      → gseapy.prerank(rnk=ranked_series, gene_sets=lib_dict, ...)
      → returns {results, ranking, gene_sets}
  → split by NES sign: positive NES = Group 1 up, negative NES = Group 2 up
  → gsea_mountain_plot() for top term in each direction

pandas / numpy gotchas (additions)

`select_dtypes(include=[np.number])` includes `np.bool_`

In numpy's type hierarchy, np.bool_ is a subtype of integer. df.select_dtypes(include=[np.number]) therefore includes boolean columns. Always exclude them explicitly before numeric operations:

bool_cols = set(df.select_dtypes(include=[bool]).columns)
numeric_cols = [c for c in df.select_dtypes(include=[np.number]).columns
                if c not in bool_cols]

Bug fixes applied in session 2026-03-24 (GSEA runtime fix)

src/pyviscel/enrichment.py

Three bugs in run_gsea_prerank() and _parse_gsea_result() prevented GSEA from running:

Wrong parameter name (weighted_score_type → weight):
- gseapy v1.1.7 renamed weighted_score_type to weight in gseapy.prerank().
- The old call passed weighted_score_type=weighted_score_type, which was silently ignored (absorbed into **kwargs), causing gseapy to use its default and raise internally.
- Fix: gseapy.prerank(..., weight=weighted_score_type).
Duplicate "term" column in _parse_gsea_result:
- gseapy v1.1.7 res2d has two columns: "Name" (library source, e.g. "prerank") and "Term" (gene set name).
- The old rename_map mapped both "Name" → "term" and "Term" → "term", creating a duplicate column that caused df[_GSEA_COLS] to return a malformed DataFrame.
- Fix: drop "Name" before renaming; only "Term" → "term".
Percentage strings in tag_pct / gene_pct:
- gseapy returns "Tag %" and "Gene %" as strings like "10.00%". pd.to_numeric() with errors="coerce" silently turned them into NaN.
- Fix: strip trailing % and divide by 100 before converting.

Root cause was identified via inspect.signature(gseapy.prerank) in the prior session.

Known Issues (as of 2026-03-24)

dash_table.DataTable deprecation warning from Dash — no functional impact.
Enrichr API uses its own built-in background — background_symbols is ignored in online mode. Use compute_go_offline() for custom-background ORA.
GSEA Prerank with 1000 permutations can take several minutes; use fast mode (100 perms) for exploratory work.
mygene.info mouse→human conversion requires internet access; offline runs with MSigDB/Reactome for mmu will fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md — Developer Guide for AI Assistants

Project Overview

Architecture

Key Dash / Plotly Gotchas (hard-won knowledge)

Plotly 6.x breaking changes

go.Scattergl vs go.Scatter

customdata extraction from selectedData

Dynamic trace structure breaks lasso indices

uirevision vs selectionrevision

Dash 4.x

pandas 2.x

Cell Selection Pipeline

2-D projections

3-D projections (camera-angle projection)

_project_3d_to_camera math

Important Store IDs

UI Component IDs (selection-related)

Testing

What Was Changed (session history)

Bug fixes applied in sessions up to 2026-03-19

Bug fixes applied in session 2026-03-20

Bug fixes applied in session 2026-03-20 (DE, enrichment, UI)

Changes applied in session 2026-03-23 (Heatmap UI + Full Enrichment Suite)

Enrichment Suite — Library Map (as of 2026-03-23)

GSEA Prerank Pipeline

pandas / numpy gotchas (additions)

`select_dtypes(include=[np.number])` includes `np.bool_`

Bug fixes applied in session 2026-03-24 (GSEA runtime fix)

Known Issues (as of 2026-03-24)

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md — Developer Guide for AI Assistants

Project Overview

Architecture

Key Dash / Plotly Gotchas (hard-won knowledge)

Plotly 6.x breaking changes

go.Scattergl vs go.Scatter

customdata extraction from selectedData

Dynamic trace structure breaks lasso indices

uirevision vs selectionrevision

Dash 4.x

pandas 2.x

Cell Selection Pipeline

2-D projections

3-D projections (camera-angle projection)

_project_3d_to_camera math

Important Store IDs

UI Component IDs (selection-related)

Testing

What Was Changed (session history)

Bug fixes applied in sessions up to 2026-03-19

Bug fixes applied in session 2026-03-20

Bug fixes applied in session 2026-03-20 (DE, enrichment, UI)

Changes applied in session 2026-03-23 (Heatmap UI + Full Enrichment Suite)

Enrichment Suite — Library Map (as of 2026-03-23)

GSEA Prerank Pipeline

pandas / numpy gotchas (additions)

select_dtypes(include=[np.number]) includes np.bool_

Bug fixes applied in session 2026-03-24 (GSEA runtime fix)

Known Issues (as of 2026-03-24)

`select_dtypes(include=[np.number])` includes `np.bool_`