Highlights
This release is a major architectural overhaul with a net reduction of ~1,770 lines of code across 124 files. Three crates were removed, the mapper system was rewritten from scratch around a trait-based design, a new geometry DSL replaces all hardcoded offsets, and barcode correction was folded into the map step. Flex-V2 (384-plex) is now fully supported alongside V1.
New Features
Geometry DSL
A domain-specific language for specifying read geometry, replacing all hardcoded offsets and scattered CLI flags (-B, -u, --spacer, --offset, --lookback).
- Components:
[barcode],[umi:N],[probe],[anchor],[protospacer],[gex] - Skip regions:
[:N]for anonymous spacers - Paired-end separator:
|splits R1 from R2 - Example:
[barcode][umi:12] | [gex][:18][probe] - Geometry is resolved against loaded libraries to compute concrete byte offsets
Geometry Presets
Five built-in presets selectable via --preset:
| Preset | Geometry |
|---|---|
gex-v1 |
[barcode][umi:12] | [gex][:18][probe] |
gex-v2 |
[barcode][umi:12][:10][probe] | [gex] |
crispr-v1 |
[barcode][umi:12] | [probe][anchor][protospacer] |
crispr-v2 |
[barcode][umi:12][:10][probe] | [:14][anchor][protospacer] |
crispr-proper |
[barcode][umi:12] | [:18][probe][anchor][protospacer] |
V2 presets automatically set remap_window=5.
Flex-V2 (384-plex) Support
Full support for 10x Genomics Flex-V2 chemistry across both GEX and CRISPR workflows, including the new geometry presets and remap window handling.
Optional Probe Demultiplexing
Probes are now fully optional. Three modes are supported:
- Probed (preset):
--preset gex-v1 -p probes.txt— demux to per-probe IBU files - Probed (custom):
--geometry "..." -p probes.txt - Unprobed:
--geometry "[barcode][umi:12]|[gex]"without-p— writes a single IBU file
Validation catches mismatches (e.g., [probe] in geometry without a probe file).
Inline Barcode Correction
Barcode correction is now performed at map time via WhitelistMapper (backed by seqhash), eliminating the separate post-processing step. This simplifies the workflow pipeline from map → sort → barcode → sort → umi → sort → count to map → sort → umi-correct → count.
Minimum IBU Record Threshold
New --min-ibu-records flag (default 1000) automatically removes sparse IBU files after mapping, preventing downstream issues from near-empty outputs.
Probe Alias Regex Filtering
--probe-regex flag enables regex-based alias filtering on probe files during loading.
Improved Statistics
- Per-library metadata: name, element count, hash count, position, mate, window, exact match mode, build time
- Detailed unmapped breakdowns: missing probe, missing feature, missing whitelist, failed UMI quality, truncated UMI — all with fraction computation
- Statistics written as three JSON files:
mapping_lib.json,mapping_map.json,mapping_run.json
Architecture Changes
Crates Removed
cyto-core— All types were either inlined into consuming crates or rewritten. ~2,400 lines removed.cyto-view— Standalone view command removed.cyto-ibu-barcode-correct— Barcode correction folded into the map step.
Mapper Rewrite
The entire mapper system was rewritten around a Mapper trait with typestate resolution (Unpositioned → Ready). Individual mappers:
WhitelistMapper— cell barcode matching viaseqhash::SeqHashGexMapper— gene expression viaseqhash::SplitSeqHashwith agreed-index resolutionCrisprMapper— two-step anchor + protospacer matching viaseqhash::MultiLenSeqHashProbeMapper— probe demultiplexing viaseqhash::SeqHashUmiMapper— UMI extraction with quality thresholding and 2-bit encoding
Core sequence mapping functionality has moved from disambiseq to seqhash
Unified Processor
A single generic MapProcessor<M: Mapper> replaces separate probed/unprobed implementations. It implements both binseq::ParallelProcessor and paraseq::PairedParallelProcessor with thread-local statistics accumulation.
Relocated Types
FeatureWritertrait:cyto-core::io→cyto-io::featureBarcodeIndexCount[s]anddeduplicate_umis:cyto-core→cyto-ibu-count::dedup
CLI Changes
Renamed/Removed Flags
--no-remap→--remap-window- Removed:
-B(barcode length),-u(UMI length),--spacer,--offset,--lookback,-H/--with-header - Removed from workflow:
--skip-barcode,--bc-exact,--skip-bc-second-pass(second pass was removed) -w/--whitelistis now required at map time (moved from workflow-level)
Default Thread Count
Default threads changed from 8 to 0 (auto-detect all available cores).
Moved CLI Struct
Cli struct moved from cyto-cli to the main cyto crate to keep the version in sync with the binary.
Bug Fixes
- Fixed incorrect unmapped total calculation
- Fixed CRISPR-V2 remap window not being applied
- Fixed
crispr-v2preset not being exposed to the CLI - Added validation that BINSEQ inputs are paired (early error on unpaired files)
- Better error messages for missing BINSEQ files and empty IBU outputs
CI
- Added 4 new integration test jobs for unprobed workflows (GEX + CRISPR, map + workflow)
- Expanded format matrix to include
vbqandcbqalongsidefastqandbq - All tests now include whitelist and preset flags
Dependencies
- Added:
seqhash 0.1.5,regex,derive_more,serde/serde_json/csv(tocyto-map) - Bumped:
bitnuc0.4.0 → 0.4.1,pycyto0.1.13 → 0.1.14 - Removed: all
cyto-coredependencies across the workspace
What's Changed
- 1 include bitnuc to encode and decode barcodes and umi as integers by @noamteyssier in #2
- 3 restructure cli interface to make use of ibu serialization by @noamteyssier in #4
- feat: completely rename package by @noamteyssier in #6
- 7 split cli into separate related submodules by @noamteyssier in #8
- 10 build in ibu text by @noamteyssier in #13
- 11 build in ibu sort by @noamteyssier in #14
- 12 build in ibu count by @noamteyssier in #15
- 9 introduce the ibu submodule by @noamteyssier in #16
- 17 allow multiple fastq inputs for merging lanes by @noamteyssier in #18
- docs: added README by @noamteyssier in #20
- 21 use preallocated reusable buffer in unpacking by @noamteyssier in #22
- 23 make mappers compatible with binseq inputs by @noamteyssier in #26
- 25 output features for flex by @noamteyssier in #27
- 28 implement barcode whitelisting and error correction by @noamteyssier in #29
- 31 implement one off mismatch detection by @noamteyssier in #32
- 24 make mappers compatible with seq io parallel by @noamteyssier in #35
- 36 have mappers be arc able instead of cloned by @noamteyssier in #37
- 38 include runtime statistics in log by @noamteyssier in #39
- 40 setup a non specific geometry mapper by @noamteyssier in #42
- 41 have mappers use positional retries by @noamteyssier in #43
- 44 introduce a mechanism for automatically determining offset on generic by @noamteyssier in #45
- 46 support new binseq reader by @noamteyssier in #47
- 48 link to binseq upstream by @noamteyssier in #49
- 50 introduce a workflow command for common workflows by @noamteyssier in #51
- 52 allow workflows to run sort and counts in background tasks by @noamteyssier in #53
- 54 use bitnuc mismatch for disambiguating barcodes by @noamteyssier in #55
- 56 fix issue with probe mapping where not all records are being written to ibu by @noamteyssier in #57
- 58 fix correct probe barcode designation by @noamteyssier in #59
- 60 provide feature file to ibu view to print name by @noamteyssier in #61
- 62 update flex mapping to allow for higher order mismatches by @noamteyssier in #64
- 63 perform weight based calculation for cell barcode correction by @noamteyssier in #65
- 66 create a new ibu subcommand for umi correction by @noamteyssier in #67
- 68 improve workflow command by @noamteyssier in #69
- 70 improve compile times by @noamteyssier in #71
- 72 rename correct to barcode by @noamteyssier in #73
- 74 migrate to new binseq lib by @noamteyssier in #75
- refactor: move into crates subdir and update paths in cargo toml by @noamteyssier in #78
- 33 have transcript gene level collapsing in count by @noamteyssier in #80
- 81 update paraseq version by @noamteyssier in #82
- 84 allow variable length sequences by @noamteyssier in #85
- 86 support vbq by @noamteyssier in #88
- 92 fix multiple occurence of gene in count by @noamteyssier in #95
- 102 include a cyto ibu cat command by @noamteyssier in #104
- 96 rename flex to gex by @noamteyssier in #105
- 106 output mtx by @noamteyssier in #107
- 103 write outputs to a subdirectory instead of in the root by @noamteyssier in #108
- Setup logging throughout crates by @noamteyssier in #109
- 110 pass mtx through workflow by @noamteyssier in #111
- 112 build integration ci by @noamteyssier in #113
- 115 add an h5ad flag to workflows by @noamteyssier in #117
- 114 add barcode number to barcode name by @noamteyssier in #118
- 119 make h5ad output the default by @noamteyssier in #120
- 123 integrate cell filter into workflow by @noamteyssier in #124
- 125 fix logging in workflow filtering by @noamteyssier in #126
- 127 integrate geomux into crispr workflow by @noamteyssier in #129
- 130 spin out scripts into a pycyto command by @noamteyssier in #131
- 128 write cell filter logs into stats by @noamteyssier in #132
- 133 update transparent uv installation to use version and control version with consts by @noamteyssier in #134
- 116 allow multiple inputs at a time by @noamteyssier in #135
- 137 panic on mapping overextension of sequence buffer in vbq by @noamteyssier in #139
- 140 run umi correction in parallel by @noamteyssier in #141
- 143 update default geomux version by @noamteyssier in #144
- 146 add a simple file marking done when done with workflow by @noamteyssier in #148
- 145 write log to output file in workflow by @noamteyssier in #149
- 147 option to keep ibu in workflow by @noamteyssier in #150
- 151 update pycyto dependency by @noamteyssier in #152
- 154 add a lookback flag for crispr probes by @noamteyssier in #155
- 136 make position remapping the default mapping style by @noamteyssier in #156
- 101 include a cyto ibu reads command by @noamteyssier in #157
- 158 add reads to workflow by @noamteyssier in #159
- 160 add probe regex by @noamteyssier in #161
- 162 make geomux arguments configurable by @noamteyssier in #163
- 153 update pycyto dependency by @noamteyssier in #164
- 165 update geomux version by @noamteyssier in #166
- 167 bump geomux version by @noamteyssier in #168
- 169 out of memory sorting for ibu sort by @noamteyssier in #170
- 171 update pycyto version by @noamteyssier in #172
- 173 remove higher distance barcode correction by @noamteyssier in #177
- 176 upgrade bitnuc version by @noamteyssier in #178
- 175 upgrade paraseq dependency by @noamteyssier in #179
- dep(gzp): upgrade incompatible by @noamteyssier in #180
- V0.6.19 by @noamteyssier in #181
- 183 replace kosaraju algorithm with single pass dfs connected components by @noamteyssier in #184
- 186 update cell filter and pycyto dependencies by @noamteyssier in #187
- 188 output sorted ibu on umi correction by @noamteyssier in #189
- 192 update to ibu v2 by @noamteyssier in #193
- 194 reduce memory overhead between shared whitelist instances by @noamteyssier in #195
- 196 remove unnecessary sort at beginning of barcode correction by @noamteyssier in #198
- 197 add per step barcode timing statistics by @noamteyssier in #199
- dep(binseq): update version by @noamteyssier in #201
- Dev by @noamteyssier in #202
- Prepare for publish by @noamteyssier in #203
- 206 update binseq version by @noamteyssier in #207
- Improve core mappers and generalize geometries by @noamteyssier in #210
- Introduce minimum record count on ibu output by @noamteyssier in #214
- feat: better error message by @noamteyssier in #215
- Create claude md files by @noamteyssier in #216
- refactor: rename from run_gex2 and run_crispr2 to run_gex and run_crispr by @noamteyssier in #217
- dep(pycyto): bump semver by @noamteyssier in #218
- Feat/support non probed inputs with new mappers by @noamteyssier in #222
- Cyto 0.4.0 by @noamteyssier in #220
New Contributors
- @noamteyssier made their first contribution in #2
Full Changelog: https://github.com/ArcInstitute/cyto/commits/cyto-0.4.0