Releases: ArcInstitute/cyto
cyto-0.4.0
Highlights
This release is a major architectural overhaul with a net reduction of ~1,770 lines of code across 124 files. Three crates were removed, the mapper system was rewritten from scratch around a trait-based design, a new geometry DSL replaces all hardcoded offsets, and barcode correction was folded into the map step. Flex-V2 (384-plex) is now fully supported alongside V1.
New Features
Geometry DSL
A domain-specific language for specifying read geometry, replacing all hardcoded offsets and scattered CLI flags (-B, -u, --spacer, --offset, --lookback).
- Components:
[barcode],[umi:N],[probe],[anchor],[protospacer],[gex] - Skip regions:
[:N]for anonymous spacers - Paired-end separator:
|splits R1 from R2 - Example:
[barcode][umi:12] | [gex][:18][probe] - Geometry is resolved against loaded libraries to compute concrete byte offsets
Geometry Presets
Five built-in presets selectable via --preset:
| Preset | Geometry |
|---|---|
gex-v1 |
[barcode][umi:12] | [gex][:18][probe] |
gex-v2 |
[barcode][umi:12][:10][probe] | [gex] |
crispr-v1 |
[barcode][umi:12] | [probe][anchor][protospacer] |
crispr-v2 |
[barcode][umi:12][:10][probe] | [:14][anchor][protospacer] |
crispr-proper |
[barcode][umi:12] | [:18][probe][anchor][protospacer] |
V2 presets automatically set remap_window=5.
Flex-V2 (384-plex) Support
Full support for 10x Genomics Flex-V2 chemistry across both GEX and CRISPR workflows, including the new geometry presets and remap window handling.
Optional Probe Demultiplexing
Probes are now fully optional. Three modes are supported:
- Probed (preset):
--preset gex-v1 -p probes.txt— demux to per-probe IBU files - Probed (custom):
--geometry "..." -p probes.txt - Unprobed:
--geometry "[barcode][umi:12]|[gex]"without-p— writes a single IBU file
Validation catches mismatches (e.g., [probe] in geometry without a probe file).
Inline Barcode Correction
Barcode correction is now performed at map time via WhitelistMapper (backed by seqhash), eliminating the separate post-processing step. This simplifies the workflow pipeline from map → sort → barcode → sort → umi → sort → count to map → sort → umi-correct → count.
Minimum IBU Record Threshold
New --min-ibu-records flag (default 1000) automatically removes sparse IBU files after mapping, preventing downstream issues from near-empty outputs.
Probe Alias Regex Filtering
--probe-regex flag enables regex-based alias filtering on probe files during loading.
Improved Statistics
- Per-library metadata: name, element count, hash count, position, mate, window, exact match mode, build time
- Detailed unmapped breakdowns: missing probe, missing feature, missing whitelist, failed UMI quality, truncated UMI — all with fraction computation
- Statistics written as three JSON files:
mapping_lib.json,mapping_map.json,mapping_run.json
Architecture Changes
Crates Removed
cyto-core— All types were either inlined into consuming crates or rewritten. ~2,400 lines removed.cyto-view— Standalone view command removed.cyto-ibu-barcode-correct— Barcode correction folded into the map step.
Mapper Rewrite
The entire mapper system was rewritten around a Mapper trait with typestate resolution (Unpositioned → Ready). Individual mappers:
WhitelistMapper— cell barcode matching viaseqhash::SeqHashGexMapper— gene expression viaseqhash::SplitSeqHashwith agreed-index resolutionCrisprMapper— two-step anchor + protospacer matching viaseqhash::MultiLenSeqHashProbeMapper— probe demultiplexing viaseqhash::SeqHashUmiMapper— UMI extraction with quality thresholding and 2-bit encoding
Core sequence mapping functionality has moved from disambiseq to seqhash
Unified Processor
A single generic MapProcessor<M: Mapper> replaces separate probed/unprobed implementations. It implements both binseq::ParallelProcessor and paraseq::PairedParallelProcessor with thread-local statistics accumulation.
Relocated Types
FeatureWritertrait:cyto-core::io→cyto-io::featureBarcodeIndexCount[s]anddeduplicate_umis:cyto-core→cyto-ibu-count::dedup
CLI Changes
Renamed/Removed Flags
--no-remap→--remap-window- Removed:
-B(barcode length),-u(UMI length),--spacer,--offset,--lookback,-H/--with-header - Removed from workflow:
--skip-barcode,--bc-exact,--skip-bc-second-pass(second pass was removed) -w/--whitelistis now required at map time (moved from workflow-level)
Default Thread Count
Default threads changed from 8 to 0 (auto-detect all available cores).
Moved CLI Struct
Cli struct moved from cyto-cli to the main cyto crate to keep the version in sync with the binary.
Bug Fixes
- Fixed incorrect unmapped total calculation
- Fixed CRISPR-V2 remap window not being applied
- Fixed
crispr-v2preset not being exposed to the CLI - Added validation that BINSEQ inputs are paired (early error on unpaired files)
- Better error messages for missing BINSEQ files and empty IBU outputs
CI
- Added 4 new integration test jobs for unprobed workflows (GEX + CRISPR, map + workflow)
- Expanded format matrix to include
vbqandcbqalongsidefastqandbq - All tests now include whitelist and preset flags
Dependencies
- Added:
seqhash 0.1.5,regex,derive_more,serde/serde_json/csv(tocyto-map) - Bumped:
bitnuc0.4.0 → 0.4.1,pycyto0.1.13 → 0.1.14 - Removed: all
cyto-coredependencies across the workspace
What's Changed
- 1 include bitnuc to encode and decode barcodes and umi as integers by @noamteyssier in #2
- 3 restructure cli interface to make use of ibu serialization by @noamteyssier in #4
- feat: completely rename package by @noamteyssier in #6
- 7 split cli into separate related submodules by @noamteyssier in #8
- 10 build in ibu text by @noamteyssier in #13
- 11 build in ibu sort by @noamteyssier in #14
- 12 build in ibu count by @noamteyssier in #15
- 9 introduce the ibu submodule by @noamteyssier in #16
- 17 allow multiple fastq inputs for merging lanes by @noamteyssier in #18
- docs: added README by @noamteyssier in #20
- 21 use preallocated reusable buffer in unpacking by @noamteyssier in #22
- 23 make mappers compatible with binseq inputs by @noamteyssier in #26
- 25 output features for flex by @noamteyssier in #27
- 28 implement barcode whitelisting and error correction by @noamteyssier in #29
- 31 implement one off mismatch detection by @noamteyssier in #32
- 24 make mappers compatible with seq io parallel by @noamteyssier in #35
- 36 have mappers be arc able instead of cloned by @noamteyssier in #37
- 38 include runtime statistics in log by @noamteyssier in #39
- 40 setup a non specific geometry mapper by @noamteyssier in #42
- 41 have mappers use positional retries by @noamteyssier in #43
- 44 introduce a mechanism for automatically determining offset on generic by @noamteyssier in #45
- 46 support new binseq reader by @noamteyssier in #47
- 48 link to binseq upstream by @noamteyssier in #49
- 50 introduce a workflow command for common workflows by @noamteyssier in #51
- 52 allow workflows to run sort and counts in background tasks by @noamteyssier in #53
- 54 use bitnuc mismatch for disambiguating barcodes by @noamteyssier in #55
- 56 fix issue with probe mapping where not all records are being written to ibu by @noamteyssier in #57
- 58 fix correct probe barcode designation by @noamteyssier in #59
- 60 provide feature file to ibu view to print name by @noamteyssier in #61
- 62 update flex mapping to allow for higher order mismatches by @noamteyssier in #64
- 63 perform weight based calculation for cell barcode correction by @noamteyssier in #65
- 66 create a new ibu subcommand for umi correction by @noamteyssier in #67
- 68 improve workflow command by @noamteyssier in #69
- 70 improve compile times by @noamteyssier in #71
- 72 rename correct to barcode by @noamteyssier in #73
- 74 migrate to new binseq lib by @noamteyssier in #75
- refactor: move into crates subdir and update paths in cargo toml by @noamteyssier in https://github.com/ArcInstitute/c...