This file is for Claude (and future AI assistants) working in this repository. Read this before making any changes.
PyViscel is a Python port of the VisCello R/Bioconductor single-cell explorer. It provides a Dash web application for interactive visualization and annotation of single-cell transcriptomics data stored in AnnData/h5ad format.
Stack: Python 3.12, Dash 4.x, Plotly 6.x, AnnData, pandas 2.x, numpy, scanpy.
src/pyviscel/
├── app.py # Dash app factory + all callbacks (largest file)
├── ui_components.py # Layout builders (no callbacks, pure HTML/Dash components)
├── plotting.py # Plotly figure builders (no Dash, pure numpy/plotly)
├── cello_class.py # Cello / CelloCollection data model
├── io.py # load_adata / save_adata / validate_adata
├── dim_reduction.py # PCA / tSNE / UMAP wrappers
├── clustering.py # Leiden / Louvain / density clustering
├── differential_expression.py # Chi-sq / MWU / sSeq DE
├── enrichment.py # GO/KEGG via gseapy
├── heatmap.py # Annotated heatmap
└── convert/
└── from_r.py # R VisCello → AnnData conversion
The app is structured as a factory function create_app(adata) inside app.py.
All callbacks are registered inside that function (closure pattern) so they share
access to the mutable adata object via _get_adata().
hoverinfo="skip"suppressesselectedData— points withhoverinfo="skip"are NOT included in theselectedDataevent in Plotly 6+. Usehoverinfo="none"instead.- This applies to every trace that the user must be able to lasso-select.
go.Scattergldoes not reliably surfacecustomdatainselectedData. Always usego.Scatterfor any trace where lasso selection must extract customdata. (Scattergl is fine for display-only traces where selection is not needed.)
Points in selectedData["points"] may have customdata as a scalar or [scalar].
Always unwrap lists before casting to int:
cd = pt.get("customdata")
while isinstance(cd, (list, tuple)):
if not cd: cd = None; break
cd = cd[0]
if cd is not None:
idx = int(cd)Also provide a pointNumber fallback for traces where customdata may be absent.
If you remove cells from a trace on re-render, pointNumber indices shift.
A preserved Plotly lasso selection will then map to wrong global cell indices.
Fix: background traces must always contain ALL cells (static structure).
Use selectionrevision (not uirevision) to clear the lasso highlight after
each selection event without resetting zoom/pan.
uirevision— preserves zoom, pan, camera angle. Change it to reset the view.selectionrevision— controls selection highlight only. Change it to clear the lasso without resetting the view. Increment it after each selection event.
- Multiple callbacks writing the same Output require
allow_duplicate=True. prevent_initial_call=Trueis required on most callbacks to avoid firing at page load.
- Categorical
fillna:series.fillna("NA")raisesTypeErroron Categorical dtypes. Use:series.astype(object).fillna("NA").astype(str) - Safe in-place assignment:
adata.obs.loc[bool_mask, col] = value(CoW-safe).
User lasso on vc-scatter
→ handle_cell_selection (selectedData)
→ vc-store-cells (list of global cell indices)
→ vc-cell-count (display label)
→ User clicks Confirm
→ confirm_selection
→ adata.obs["Manual_Selection"] = Group 1 / 2 / 3 ...
→ vc-store-group-counter incremented
→ vc-color-dd options updated (Manual_Selection appears)
User rotates vc-scatter (3D)
→ track_3d_camera (relayoutData → vc-3d-camera)
→ User clicks "Snapshot Current View"
→ render_3d_proj_view
reads vc-3d-camera + adata.obsm[proj_key]
_project_3d_to_camera(xyz, camera) → px, py
renders vc-3d-proj-view (go.Scatter, customdata=ci_array, dragmode=lasso)
→ User lassos on vc-3d-proj-view
→ handle_3d_proj_selection (selectedData)
→ vc-store-cells (same as 2-D from here)
→ User clicks Confirm → same confirm_selection callback
- Compute forward vector:
fwd = (center - eye) / |center - eye| - Right axis:
right = cross(fwd, up) / |cross(fwd, up)| - Up-ortho axis:
up_ortho = cross(right, fwd) - Normalize point cloud to
[-1,1]³ px = xyz_norm @ right,py = xyz_norm @ up_ortho
| Store ID | Type | Purpose |
|---|---|---|
vc-store-cells |
memory | Current lasso selection (list of global cell indices) |
vc-store-group-counter |
memory | Next group number for Manual_Selection |
vc-3d-camera |
memory | Last known Plotly camera dict for 3D scatter |
vc-store-sel-history |
memory | Legacy — kept in layout but no longer used by callbacks |
| ID | Component | Purpose |
|---|---|---|
vc-scatter |
dcc.Graph | Main scatter plot (2D or 3D) |
vc-cell-count |
html.Small | Displays "N cells selected" |
vc-confirm-annotation-btn |
dbc.Button | Saves current selection as next Group |
vc-3d-sel-panel |
html.Div | Hidden for 2D; shown for 3D |
vc-3d-snapshot-btn |
dbc.Button | Takes camera-angle snapshot |
vc-3d-proj-view |
dcc.Graph | 2D projection canvas (lasso here) |
vc-3d-proj-status |
html.Small | Status/instruction text |
vc-3d-proj-clear-btn |
dbc.Button | Clears projection + resets store-cells |
pytest tests/ -q # should be 390 passed- Tests are in
tests/— one file per module. test_app.pytests layout structure and callback helpers via direct function calls.- Never mock the AnnData object — tests build real small AnnData fixtures.
- All gseapy API calls in
test_enrichment.pyare mocked withunittest.mock.patch. - All mygene.info API calls in
test_enrichment.pyare mocked withunittest.mock.patch("requests.post", ...). - All 412 tests must pass before committing.
app.py
confirm_selection: Changedexcept Exception: raise PreventUpdate→ always incrementgroup_countereven when scatter re-render fails. This was the root cause ofManual_Selectionnever appearing in the Color By dropdown.confirm_selection: Removedvc-store-sel-historyState (no longer needed).handle_cell_selection: AddedpointNumberfallback + robust customdata unwrapping for both scalar and[scalar]formats._get_projection_options: Fixed fallback to guard withcello_name in adata.uns.get("cellos", {})before iterating obsm.- Replaced entire 3D multi-view system (3×axis dropdowns, 3×view renders,
3×view handlers, apply/clear/summarize) with camera-angle projection approach:
_project_3d_to_camerahelpertrack_3d_cameracallbackrender_3d_proj_viewcallback (Snapshot button)handle_3d_proj_selectioncallback (lasso → vc-store-cells)clear_3d_proj_selectioncallback (Clear button)
- Added
dcc.Store(id="vc-3d-camera")to layout.
plotting.py
expression_scatter: Changed all 4go.Scattergl→go.Scatter; addedci_array = np.array(cell_indices)andcustomdata=ci_array[mask]to every trace so lasso selection works on gene expression views.- All traces: changed
hoverinfo="skip"→hoverinfo="none"(Plotly 6 fix). scatter_plotcover0 background trace: samehoverinfofix.- Fixed
fillna("NA")for Categorical columns (pandas 2.x CoW).
ui_components.py
- Removed
_sel_view_block(3-panel multi-view layout). - Added
_camera_sel_panel()with new IDs:vc-3d-snapshot-btn,vc-3d-proj-clear-btn,vc-3d-proj-status,vc-3d-proj-view. vc-3d-sel-panelnow renders_camera_sel_panel()instead of the old 3-grid layout.
app.py — track_3d_camera
- Fixed 3-D camera tracking: Plotly 6 sends rotation events as
{"scene.camera": {"eye": …, "up": …, "center": …}}(nested dict under one key), NOT as flat"scene.camera.eye"/"scene.camera.up"/"scene.camera.center"keys. The old code looked for the flat keys → always got empty dict → always raisedPreventUpdate→vc-3d-camerastore never updated → Snapshot always used default angle. Fix: check"scene.camera"first (nested form), then fall back to flat keys.
plotting.py — scatter_plot_3d (line ~676)
- Fixed pandas Categorical
fillnacrash in the 3-D colour path. The 2-D path (scatter_plot) already used the safe pattern; the 3-D path did not. Changedvalues.fillna("NA")→values.astype(object).fillna("NA").astype(str). (Same fix as documented in the pandas 2.x gotcha above.)
app.py
run_de(bidirectional DEGs): singlerun_de_testcall, split bylog2fc > 0(Group 1) andlog2fc < 0(Group 2, fold-change negated). Both groups now appear in DE result tabs._de_df_to_records: fixed bool→float corruption fromselect_dtypes(include=[np.number]), which includesnp.bool_. Now explicitly excludes bool columns before rounding. Without this fix,significantbecame1.0/0.0after JSON round-trip, breaking downstreamdf[df["significant"]]filters.- All
df[df["significant"]]→df[df["significant"].astype(bool)]for safety. update_palette_options/render_scatter: default colormap for gene expression changed from"rainbow2"to"viridis".run_go_enrichment: addedmin_overlap=3(was 5), diagnostic status message (gene count, organism, go_type), full traceback in error tabs.vc-go-statusdiv shows errors and completion status.min_overlaplowered to avoid silently discarding valid results.update_de_projcallback: addedvc-de-proj-ddprojection selector in DE panel.render_de_scatter/render_de_gene_scatter: new callbacks for DE scatter + gene expression scatter using selected projection (supports 3D).- Added lazy gene search callbacks (
search_gene_options,search_de_gene_options) — return ≤50 matches on keystroke instead of loading all var_names at start.
plotting.py
expression_scatter/expression_scatter_3d: defaultpalchanged to"viridis".
ui_components.py
- DE controls row: added
vc-de-proj-ddprojection dropdown (5-column layout). results_panel: restructured from 2-col to 3-col — scatter | gene expression scatter | heatmap.- Added
vc-de-gene-searchdropdown andvc-de-gene-scattergraph (middle column). - Added
vc-go-organismdropdown (default"hsa") in enrichment section. GO_TYPESKEGG value corrected:"KEGG"→"kegg".- Added
"all"option toGO_TYPES. - Added
vc-go-statusdiv for error/status display.
enrichment.py
- Added
_validate_gene_symbols()— rejects bool lists,"True"/"False"string lists, and auto-coerces pandas bool Series to gene names with a warning. - Added organisms:
rno(rat),dme(fly),dre(zebrafish/fish),sce(yeast) toENRICHR_LIBRARIES,_ENRICHR_ORGANISM,_VALID_ORGANISMS. run_enrichmentgetsorganism: str | None = Noneparameter that overridesadata.unsconfig.- Background warning in
run_enrichmentfires only when caller explicitly suppliesbackground_symbols(not for the defaultadata.var_names). _parse_enrichr_result: normalisesGenescolumn from list or string → semicolon-separated string. gseapy sometimes returns a Python list instead of a string.
differential_expression.py
feature_name_columndtype check: if the configured column is not string/object dtype (e.g. a boolean"highly_variable"column), falls back toadata.var_namesand clears the bad config entry fromadata.unsso the warning fires only once per session.
tests/test_enrichment.py (30 new tests, 91 total)
TestValidateGeneSymbols— 8 tests: bool list, bool-string list, pandas bool Series, pandas Index, empty list.TestParseEnrichrResultListGenes— 2 tests: list-type Genes column handling.TestNewOrganisms— 12 tests: rno/dme/dre/sce registry + organism string + API routing.TestRunEnrichmentextended — 6 tests:organism=override for all new organisms.- Fixed
test_background_ignored_warningto callrun_enrichment(warning lives there, not incompute_go).
ui_components.py
- Heatmap: replaced 10/20/50/100 button group with free-form number input (
vc-de-top-n-input, default 50, min 1). Removed "⚠ Recommend ≤ 50 genes for performance" warning. GO_ORGANISMS: removedrno(Enrichr does not support rat).GO_TYPES: expanded to 9 options — BP, MF, CC, All GO (go_all), KEGG, WikiPathways (wiki), MSigDB Hallmark (msigdb), Reactome Pathways (reactome), All.enrichment_section: full replacement — ORA/GSEA mode toggle (vc-enrich-mode), fast-mode checkbox (vc-enrich-fast-mode), side-by-side Group 1/Group 2 dotplot+table layout (vc-enrich-dotplot-g1/g2,vc-enrich-table-g1/g2), hidden GSEA panel (vc-enrich-gsea-results), library warning div (vc-go-lib-warning).
app.py
vc-hmap-topn-storedefault changed from 30 → 50.store_top_n: rewritten to read from number input instead of 4 buttons.download_heatmap: changed fromwrite_html→write_image(format="png", scale=2)using kaleido.- Added stores:
vc-store-enrich-g1,vc-store-enrich-g2,vc-store-gsea. - Removed old
run_go_enrichment(single-group, tabs-based) anddownload_go_table(Excel) callbacks. gate_library_options: new callback — hides MSigDB/Reactome for non-human/mouse organisms.toggle_enrich_mode: new callback — shows ORA or GSEA results panel based on mode selector.run_ora_enrichment: new callback — runs ORA for both DE groups simultaneously; produces side-by-side dotplots + tables; ID mismatch warning (<50% overlap).run_gsea_enrichment: new callback — builds signed log2FC ranked list from DE results (g1 positive, g2 negated back to negative), runsrun_gsea_prerank(), splits results by NES sign, renders mountain plots.download_enrichment_csv: new callback — downloads ORA (both groups) or GSEA results as.csv.
enrichment.py — major overhaul
ENRICHR_LIBRARIES: fully replaced. GO 2025 for hsa/mmu, GO 2018 for dme/dre/sce/cel/rno. KEGG:KEGG_2019_Human/KEGG_2019_Mouse/KEGG_2019(others). WikiPathways:WikiPathways_2024_Human/WikiPathways_2024_Mouse/WikiPathways_2018(others). MSigDB (MSigDB_Hallmark_2020) and Reactome (Reactome_Pathways_2024) for hsa/mmu only._VALID_GO_TYPES: addedwiki,msigdb,reactome,go_all._HUMAN_MOUSE_ONLY_TYPES = frozenset({"msigdb", "reactome"}): new constant — these types require mouse→human conversion for mmu.mouse_to_human_online(): new function — replaces HMD file lookup with mygene.info API (2 batch POSTs, no key, ~0.4 s for 30 genes).compute_go(): auto-triggersconvert_mouse_to_humanfor mmu + msigdb/reactome; usesmouse_to_human_online()whenhmd_path=None(no longer raises ValueError); addedmin_gene_set_size=10/max_gene_set_size=500filters; retry (3×, exponential backoff); low-overlap warning (<50%)._parse_enrichr_result(): addedgene_ratio = overlap_count / overlap_totalcolumn.run_enrichment(): passes throughmin_gene_set_size/max_gene_set_size._compute_running_es(): new helper — weighted GSEA running enrichment score algorithm._parse_gsea_result()/_GSEA_COLS: new GSEA result normalizer.run_gsea_prerank(): new function — wrapsgseapy.prerank()with retry, returns{results, ranking, gene_sets}dict for downstream mountain plotting.
plotting.py
enrichment_dotplot(): new function — Plotly bubble chart (x = gene_ratio, size = overlap_count, color = pval_adj, top 10 terms).gsea_mountain_plot(): new function — Plotly enrichment score curve with inline running ES computation, hit rug, peak marker, NES/FDR annotation.
tests/test_enrichment.py (22 new tests, 412 total)
- Updated:
test_hsa_bp_uses_2025_library,test_correct_library_for_hsa_kegg(→ KEGG_2019_Human),test_sce_kegg_library_defined(→ KEGG_2019),test_dme/dre_kegg_uses_kegg_2019,test_convert_mouse_to_human_uses_online_when_no_hmd_path. - Added:
TestHumanMouseOnlyTypes(10 tests),TestMouseToHumanOnline(4 tests),TestRunGseaPrerank(5 tests),TestComputeRunningEs(3 tests).
System fix
- Removed corrupt macOS metadata file
/Volumes/Shared/Concord/src/._concord_sc.egg-infothat was blocking all pip installs. - Installed
kaleido(required for PNG heatmap export).
| go_type | Human (hsa) | Mouse (mmu) | Fly/Fish/Yeast/Worm |
|---|---|---|---|
| BP | GO_Biological_Process_2025 | same | GO_Biological_Process_2018 |
| MF | GO_Molecular_Function_2025 | same | GO_Molecular_Function_2018 |
| CC | GO_Cellular_Component_2025 | same | GO_Cellular_Component_2018 |
| kegg | KEGG_2019_Human | KEGG_2019_Mouse | KEGG_2019 |
| wiki | WikiPathways_2024_Human | WikiPathways_2024_Mouse | WikiPathways_2018 |
| msigdb | MSigDB_Hallmark_2020 | same* | — |
| reactome | Reactome_Pathways_2024 | same* | — |
*Mouse msigdb/reactome: auto-converts mouse symbols → human orthologs via mygene.info before calling Enrichr.
DE results (vc-store-de)
→ g1 genes: use stored log2fc as-is (positive)
→ g2 genes: negate stored log2fc back (negative, since app.py negates them on storage)
→ combine → pd.Series sorted descending
→ run_gsea_prerank(ranked_series, organism, go_type, permutations)
→ gseapy.get_library(lib) ← downloads + caches gene set dict
→ gseapy.prerank(rnk=ranked_series, gene_sets=lib_dict, ...)
→ returns {results, ranking, gene_sets}
→ split by NES sign: positive NES = Group 1 up, negative NES = Group 2 up
→ gsea_mountain_plot() for top term in each direction
In numpy's type hierarchy, np.bool_ is a subtype of integer. df.select_dtypes(include=[np.number])
therefore includes boolean columns. Always exclude them explicitly before numeric operations:
bool_cols = set(df.select_dtypes(include=[bool]).columns)
numeric_cols = [c for c in df.select_dtypes(include=[np.number]).columns
if c not in bool_cols]src/pyviscel/enrichment.py
Three bugs in run_gsea_prerank() and _parse_gsea_result() prevented GSEA from running:
-
Wrong parameter name (
weighted_score_type→weight):- gseapy v1.1.7 renamed
weighted_score_typetoweightingseapy.prerank(). - The old call passed
weighted_score_type=weighted_score_type, which was silently ignored (absorbed into**kwargs), causing gseapy to use its default and raise internally. - Fix:
gseapy.prerank(..., weight=weighted_score_type).
- gseapy v1.1.7 renamed
-
Duplicate
"term"column in_parse_gsea_result:- gseapy v1.1.7
res2dhas two columns:"Name"(library source, e.g."prerank") and"Term"(gene set name). - The old
rename_mapmapped both"Name"→"term"and"Term"→"term", creating a duplicate column that causeddf[_GSEA_COLS]to return a malformed DataFrame. - Fix: drop
"Name"before renaming; only"Term"→"term".
- gseapy v1.1.7
-
Percentage strings in
tag_pct/gene_pct:- gseapy returns
"Tag %"and"Gene %"as strings like"10.00%".pd.to_numeric()witherrors="coerce"silently turned them intoNaN. - Fix: strip trailing
%and divide by 100 before converting.
- gseapy returns
Root cause was identified via inspect.signature(gseapy.prerank) in the prior session.
dash_table.DataTabledeprecation warning from Dash — no functional impact.- Enrichr API uses its own built-in background —
background_symbolsis ignored in online mode. Usecompute_go_offline()for custom-background ORA. - GSEA Prerank with 1000 permutations can take several minutes; use fast mode (100 perms) for exploratory work.
- mygene.info mouse→human conversion requires internet access; offline runs with MSigDB/Reactome for mmu will fail.