Create new "GFA tabix" adapter for multi-way synteny visualizations#5518
Draft
Create new "GFA tabix" adapter for multi-way synteny visualizations#5518
Conversation
Implement SyRI structural classification (SYN/INV/TRANS/DUP) with plotsr-compatible coloring, 3-tier PIF format (full/summary/structural), format converters for SyRI output, BEDPE, rGFA, and MAF files, multi-pair PIF indexing, and a Quick Import wizard for single-file multi-genome setup. Key changes: - syriUtils.ts classifies alignments by structural type across all PIF tiers - Summary tier uses absolute-position indel encoding (id:Z: tag) instead of CIGAR - make-pif CLI accepts --format, --assemblies, --pairs, --merge-gap options - PairwiseIndexedPAFAdapter supports 3-tier LOD and multi-pair PIF files - Import form adds Quick Import tab and bulk assembly addition Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLI --all-vs-all flag: scans PAF for unique assemblies, auto-orders by syntenic coverage, generates multi-pair PIF - CLI --session flag: emits .session.json loadable via ?session=url - plotsr genomes.txt parser for assembly name/ordering extraction - Parser tests for SyRI output, BEDPE, MAF formats - Auto-ordering algorithm tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SyRI parser: fix 12-column format (refChr refStart refEnd - - qryChr qryStart qryEnd ID parent type -), add HDR type support, skip NOTAL entries without query mapping - BEDPE parser: search all columns for type tag (handles 7-col and 10-col) - All-vs-all: handle HPRC naming convention (sample#hap#contig) - Add plotsr Arabidopsis 4-way test data (col-0, ler, cvi, eri) Validated on real data: - plotsr Arabidopsis SyRI output → 44K lines, 4 structural types - plotsr BEDPE → 7K lines with type classification - HPRC chr1 untangle PAF (123K lines) → 2.3s processing, 46 assemblies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Walk P-lines (GFA1) and W-lines (GFA1.1+) to identify shared segments between genome paths. Adjacent shared segments are merged into synteny blocks with proper coordinate tracking. Handles both PGGB-style pangenome graphs (P-lines) and minigraph-cactus style (W-lines). Validated on PGGB chrM pangenome (4 human genomes): correctly extracts pairwise synteny blocks from graph structure without needing external tools like odgi or vg. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Comprehensive next steps plan covering: runtime GFA integration, MultiLGVSyntenyDisplay, genome sub-selection, graph↔synteny integration - Synthetic data generators: 3-way, 8-way, all-vs-all PAF, 4-genome GFA - Download script for real data (plotsr, PGGB chrM, ntSynt great apes) - Build script to convert all test data to PIF format - ntSynt great apes synteny blocks (6 primate species) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New MultiLGVSyntenyDisplay shows stacked multi-genome synteny ribbons within a LinearGenomeView. Each row is a query genome colored by SyRI structural type. Supports genome sub-selection, configurable row height, and right-click to launch full synteny view. Update PANGENOME_NEXT_STEPS.md with unified adapter architecture: all data sources (PIF, PAF, GFA server, GFA file) produce the same SyntenyFeature objects, making displays interchangeable with adapters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pes TS error - Add "Launch N-way synteny view" menu item using init-based assembly loading - Refactor 2-way launch to also use init approach for consistency - Fix missing syriTypes destructuring in executeSyntenyInstanceData - Update tracking docs (B3 completed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- getConfigOverrides: type getSnapshot return as Record<string, unknown> - canvas LinearFeatureDisplay: use this.showDescriptions/geneGlyphMode for same-block getter references instead of self - WiggleComponent: import WiggleDisplayModel type, annotate ticks map params - MultiWiggle renderSvg: handle optional color/negColor from Source interface - browser-tests redraw: type canvas element as HTMLCanvasElement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests The SVG renderer was using a different color resolution path than the canvas renderer (fallback to '#999' instead of model.posColor, different negColor handling). Now both renderers resolve colors identically: - posColor = source.color ?? model.posColor - negColor = overlay ? posColor : model.negColor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates indexed SQLite database from GFA files with tables for segments, paths, and path_steps. Enables runtime random-access queries by path name and offset range, which is the foundation for the GfaSyntenyAdapter. - Schema: segments, paths, path_steps with cumulative_offset index - Supports both P-lines (GFA1) and W-lines (GFA1.1+) - Assembly filtering via --assemblies flag - 7 unit tests covering schema, segments, paths, offset queries, and shared segment joins - Update @types/node to v25 for node:sqlite types - Fix rmdir → rm deprecation in jbrowse-desktop - Remove stale ts-expect-error in jbrowse-img Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…PRC demo data - GfaTabixAdapter: runtime synteny from GFA tabix files (pos.bed.gz + segs.bed.gz) with segment merging, unified header parsing, and getChromSizes() for auto-assembly - make-gfa-tabix CLI: converts GFA to tabix with #sizes= header for chrom sizes - Rust tools/gfa-to-tabix: streaming two-pass converter for large GFA (1.86M segments) - MultiLGVSyntenyDisplay: auto-creates session assemblies from GFA header via FromConfigRegionsAdapter, with one-shot guard to avoid re-running on pan/zoom - HPRC demo: chrM (44 haplotypes) + chr20 (90 haplotypes) configs using S3-hosted tabix files at jbrowse.org/demos/gfadata/hprc-v1.1-mc-grch38/ - Tests: 12 adapter tests (synthetic + HPRC chrM), 9 CLI tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…splay - Replace fixed rowHeight with auto-calculated row height (height/numGenomes) that fits rows to display height, matching MultiWiggle pattern - Add manual row height option (5/10/15/20/30px) alongside auto default - Add colorBy property with strand (blue/orange), syri (SYN/INV/TRANS/DUP), and identity (green→red gradient) color schemes - Convert 90-item 2-way synteny submenu to searchable dialog - Simplify rendering component to accept model directly - Adapt rendering for dense display: hide labels when rowHeight<12, hide separators when <4, hide legend when <8 - Remove unused rowHeight/maxRows config schema entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `@jbrowse/plugin-graph` providing a pangenome graph visualization view based on BandageNG's rendering approach. Core features: - GFA 1.0/2.0 parser and graph converter with path support - WebGL2 renderer with GLSL shaders for triangle-based graph rendering - WebGPU renderer with WGSL shaders and identical visual output - Canvas2D fallback renderer for environments without GPU support - GraphRenderer facade with automatic backend selection and fallback (WebGPU → WebGL2 → Canvas2D), matching AlignmentsRenderer pattern - MeshBuilder for efficient indexed triangle mesh construction - Bezier tessellation for smooth edge curves with round caps - Hit detection for node/edge hover and selection - MST state model with pan/zoom/color scheme/hover/selection state - Import form with file upload, URL input, and example graph - Root configuration schema for graph genome datasets - Widen Plugin.rootConfigurationSchema return type to accept IAnyType (eliminates ts-expect-error for types.maybe/types.array wrappers) Registered in jbrowse-web as a core plugin with "Add → Graph genome view" menu item. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `GfaAdapter` in comparative-adapters plugin that loads plain-text GFA files as synteny tracks with assemblyNames, matching the existing GfaTabixAdapter pattern but without requiring tabix preprocessing. - Parses GFA1 S/L/P lines and GFA1.1 W-lines (walk lines) - Builds path-based coordinate system from segment lengths - Implements getMultiPairFeatures for synteny feature extraction - Registered in syntenyTypes and multiPairTypes - Replaces the graph plugin's rootConfigurationSchema approach with standard track configuration Remove rootConfigurationSchema from graph plugin — GFA datasets are now configured as regular SyntenyTrack entries with GfaAdapter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt culling Emulates BandageNG's per-item independence and spatial culling patterns: - Split geometry into edge/node/arrow sub-batches with vertex range tracking so hover/select updates only touch one element's colors via bufferSubData - Move line thickness expansion from CPU to GPU shader (normal * thickness / scale), eliminating geometry rebuilds on zoom - Add viewport culling with 150ms debounced rebuild on pan/zoom settle - Add EdgeSpatialIndex for O(1) edge hit detection instead of O(E) linear scan - Add incremental recolorNodes() for color scheme changes without geometry rebuild - Remove scale, hoveredNode, hoveredEdge, selectedNode from BuildOptions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Separates multi-pair synteny (PairwiseIndexedPAFAdapter, GfaTabixAdapter, GfaAdapter) from regular pairwise SyntenyTrack by giving them their own track type. MultiLGVSyntenyDisplay is now registered to MultiSyntenyTrack instead of SyntenyTrack, removing the need for the display reordering hack in Core-preProcessTrackConfig. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rust tool (gfa-to-tabix): - Add --bubbles <vcf> flag: reads VCF from vg deconstruct, computes CS between allele pairs at each snarl, outputs tabix-indexed bubbles.bed.gz - Remove --aln/--aln-bin flags and all pairwise aln generation code - Remove rayon dependency (no longer needed) - Keep CS computation utilities for bubble CS generation TypeScript: - Remove binary aln reader/writer (binaryAlnReader.ts, binaryCs.ts) - Remove alnBin config options from GfaTabix and Sharded adapters - Update text aln reader to support identity column and skip CS at zoom-out - Clean up binary aln references from adapter setup/dispatch The bubble approach avoids O(n²) pairwise comparisons by leveraging graph structure: each snarl has a small number of alleles, and CS between allele pairs is precomputed once. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add bubblesLocation/bubblesIndex config to both adapter schemas - Load bubbles.bed.gz via tabix in adapter constructor - At zoomed-in views (bpPerPx < 50), query bubbles and annotate synteny features with precomputed CS from snarl allele pairs - Remove all aln-related code (alnLocation config, alnFile, getMultiPairFeaturesFromAln, aln setup) since bubbles replaces it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate a synthetic pangenome on volvox ctgA with 50 samples sharing variants from a common pool (~396 sites: SNPs, indels, SVs including inversions). Produces segment-decomposed GFA, tabix-indexed position data, and bubbles BED with precomputed CS strings for base-level synteny detail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e to config - Fix genome name mismatch in bubbles: Rust tool now maps VCF sample names to GFA genome names (e.g. CHM13 -> CHM13#0) with per-haplotype handling for diploid data. This was causing annotateFeaturesWithBubbleCs to silently skip all genomes, so no SNPs rendered. - Add --output-config flag to gfa-to-tabix that writes a JBrowse config JSON with both GfaTabix synteny and VcfTabix variant tracks - Add allele size/pair limits (MAX_ALLELE_LEN=10K, MAX_PAIRS_PER_SITE=500) to Rust bubble generation to prevent runaway output on multi-allelic SVs - Add VcfTabixAdapter variant tracks to all HPRC chrM and chr20 configs - Add 50-sample volvox pangenome track to main volvox config - Remove redundant scripts/vcf-to-bubbles.ts (Rust tool is sole implementation) - Add debug logging to annotateFeaturesWithBubbleCs for genome match diagnostics - Regenerated and uploaded chrM and chr20 bubbles+VCF to S3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace O(n*m) bubble filtering with binary search in annotateFeaturesWithBubbleCs — bubbles are already sorted, so use binary search to find first overlap then scan forward - Remove dead code from Rust tool: complement_byte, fill_seg_sequence_by_ord, emit_match_bin (leftovers from binary aln format) - Update BUBBLES_NEXT_STEPS with prioritized roadmap: identity coloring (high), density heatmap and cross-linking (medium), GPU budget and dynamic vg adapter (low/needs evaluation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23cdd58 to
3151592
Compare
The GFA tabix format stores path-level segment data but lacked graph topology (edges/links). This made subgraph extraction fundamentally broken: alt allele segments had ordinals unreachable from ref coordinate queries, producing linear chains instead of proper bubble structures. Converter (Rust): - Parse L-lines and build bidirectional adjacency lists - Write edges.bin (10 bytes/edge: target_ord, orientations, target_len) and edges.idx (u64 byte offsets per ordinal) - chr20: 50MB edges.bin + 15MB edges.idx vs 1.4GB segments.bin Adapter (TypeScript): - Add EdgeRecord type, parser, lazy-loaded edge index - getSubgraph uses 1-hop edge lookup from viewport ref nodes to discover alt alleles precisely, emitting S+L lines only (no synthetic P-lines) - Falls back to path-based inference when edge files absent - For a single SNP: ~3-5 nodes instead of 647 Other improvements: - GraphGenomeView: default drawPaths=false, add toggle in settings menu - GfaAdapter: infer links from path adjacency when no L-lines exist - Graph view launch: add error handling, clamp negative coordinates - Add getSubgraph tests for both GfaAdapter and GfaTabixAdapter - Add round-trip test verifying getSubgraph output parses through GraphGenomeView's parseGFA + convertGFAToGraph Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3151592 to
7997de0
Compare
… plugins The TreeSidebar component, tree drawing autoruns, cluster utilities, and d3-hierarchy2 fork were duplicated across variants and wiggle plugins. This consolidates them into a single shared package and updates all consumers to import directly from @jbrowse/tree-sidebar. Also adds genome name overlay and debug logging to MultiLGVSyntenyDisplay for offset investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… files (1 ref + 50 samples) with
seeded mutations from the volvox reference
- scripts/build-volvox-pangenome.sh — full pipeline:
a. Runs the TS script to generate FASTAs
b. Runs pggb (via singularity) to build the pangenome graph
c. Runs vg deconstruct to produce a VCF with 384 variant records
d. Runs gfa-to-tabix --bubbles to produce all indexed files including per-genome bubble rows
(11,083 records with PanSN path names like sample01#0#ctgA)
- Test data regenerated with real pggb graph and real vg deconstruct VCF
- Bubbles file now has entries for every genome's coordinate system, so resolveTabixRefName works
automatically for any-genome-as-reference
…xing - Fix findBubblePairRecord to flip CS when view ref carries a higher-numbered allele than query (was returning wrong SNP direction) - Fix bubbles constructor to also check localPath locations (not just uri) - Rust tool now emits per-genome bubble rows with PanSN path names so bubbles.bed.gz is queryable from any genome via tabix - Rust tool memory optimization: only keep ref path walk in memory, read segments.bin on-demand for ordinal lookups (~3GB → ~39MB) - Rename volvox_pangenome → volvox_indel_pangenome for clarity - Add integration tests for bubble CS from ref and non-ref perspectives - Add test-bubbles.sh with 18 assertions validating Rust tool output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The labelW (120px sidebar) was being added to every feature's screen x-position, causing all features to render 120px too far right compared to the scalebar and other tracks. Features at the right edge of the viewport were pushed off-screen entirely. - Remove labelW offset from feature positions in Canvas2D, GPU shaders (GLSL + WGSL), computeVisibleLabels, featureToRect, and hit testing - Extract renderMultiSyntenyToCtx() shared function so renderSvg.tsx reuses the Canvas2D drawing code via SvgCanvas instead of duplicating it - Re-draw sidebar background+labels after features in Canvas2D path so genome names aren't obscured by features extending into the label area - Add per-row backgrounds to GenomeNameOverlay for GPU path coverage - Remove now-unused labelW params from featureToRect, FeatureHighlightOverlay, and computeMultiSyntenyLabels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The labelW uniform was no longer read by any shader code after the position fix. Remove it from the render() signatures, writeUniforms(), shader struct declarations, and the MultiSyntenyGpuBackend interface. Also remove unused labelW params from featureToRect, FeatureHighlightOverlay, and computeMultiSyntenyLabels, and simplify redundant x1=px1 variable aliases in renderMultiSyntenyToCtx. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move drawInsertion/drawSerifs to packages/alignments-core so the insertion indicator rendering is shared between the alignments and synteny displays. All insertion sizes now show their length: - Large (>=10bp, >=15px wide): white text centered inside the box - Long (>=10bp, <15px wide): label to the right of the bar - Small (<10bp): label to the right of the 1px indicator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SVG/canvas text overlay (VisibleLabelsOverlay) now emits labels for small and long insertions too, not just large ones. Small/long labels render left-aligned in the insertion color to the right of the indicator, matching the Canvas2D drawInsertion behavior from alignments-core. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the intermediate "long" insertion type. Now there are only two: - Small: 1px line with serifs and size label to the right - Large (>=10bp AND >=15px wide): wide box with centered white text Matches the alignments plugin's small/large classification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move drawDeletion to alignments-core alongside drawInsertion so both CIGAR op drawing functions are shared consistently - Extract addInsertionLabel helper in computeVisibleLabels to eliminate duplicated insertion label logic between addCigarLabels and addCsLabels - Inline trivial local variables (px, pw) in drawCigarOps/drawCsOps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7997de0 to
f2010d6
Compare
Small insertions now just show the indicator (line + serifs) without a size label, matching the alignments plugin behavior. Text only appears when the insertion is large enough on screen (>=10bp AND >=15px wide). This means the labels naturally disappear as you zoom out. Removed the now-unused textAlign/color fields from VisibleLabel and reverted VisibleLabelsOverlay to its simpler form. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e colors Consolidate all CIGAR/CS op drawing and parsing into alignments-core so both plugins/alignments and plugins/linear-comparative-view share identical rendering logic from one source. Delete cigarConstants.ts since everything it contained now lives in alignments-core. Use theme palette colors (via useTheme/createJBrowseTheme) instead of hardcoded DEFAULT_SYNTENY_COLORS for deletion, insertion, and base colors. Remove DEFAULT_SYNTENY_COLORS from multiSyntenyBackendTypes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
labelW was removed from computeMultiSyntenyLabels params but left in the dependency array, causing unnecessary recomputation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # .gitignore # plugins/alignments/src/LinearAlignmentsDisplay/components/shaders/utils.ts # plugins/wiggle/src/LinearWiggleDisplay/components/WiggleComponent.tsx # pnpm-lock.yaml # tsconfig.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
JBrowse 2 can show multiple pairwise comparisons but is is not very smooth
a) you have to manually go through several steps in the linear synteny import form
b) it uses multiple files e.g. human vs mouse paf, and mouse vs rat paf
c) it also takes up a lot of visual space on the users screen: each row is a full featured linear genome view, and cannot be collapsed down to a single 'line' currently
This PR
This PR designs a new data format called "GFA tabix" to help index into a pangenome graph (containing multiple whole genome assemblies) and adds several custom visualization to help incorporate it into jbrowse 2
Screenshots
4-way arabidopsis from single file
Volvox_del.fa and volvox_ins.fa vs the reference volvox (simple demo data)

HPRC mitochondrial genome 44 samples

Data format
We made a rust program that iterates over the converter walks every path
once, accumulating segment lengths to compute genomic coordinates, then
creates two files
Firstly, it creates a tabix-indexed file called pos.bed.gz that maps genome coordinates (e.g. you are browsing region of interest on hg38) to the GFA data that overlaps that region, and gives you the graph 'segment ids'
Then you can secondarily query another file: segments.gz (a bgzip compressed + gzi indexed+ segment id-> byte offset indexed, kind of like fai for fasta) that we created that actually does the reverse, and maps each segment back to the genome coordinates for genomes that pass through that segment of the graph.
Crucially, pos.bed.gz does not have "reference bias" because it encodes the entire graph and can be queried from any genome in any way you want. You query it using PanSN naming scheme which is just a simple way to prepend assembly name essentially to the chromosome name, and thus stores data from every assembly in that single file as a result https://github.com/pangenome/PanSN-spec
That is the gist of it. I was very happy to make some progress on it so just making a very early draft PR as usual
Note: I also looked at existing tools that are sort of similar, and https://jmonlong.github.io/manu-vggafannot/ ('gafannot') was a helpful motivator. It has some similar concepts but gafannot indexes something called GAF (graph alignment format) e.g. high throughput reads or even gff3 features aligned to the graph, whereas we are indexing the graph (GFA) itself!
Note: i used claude code extensively to draft this and it will need further review and validation. A more 'ai generated summary' is in PR_SUMMARY_SIMPLIFIED.md in this branch but i didn't want to subject people to raw AI generated text 🤖