Skip to content

Releases: ArcInstitute/cyto

cyto-0.4.0

03 Mar 21:26
4e51798

Choose a tag to compare

Highlights

This release is a major architectural overhaul with a net reduction of ~1,770 lines of code across 124 files. Three crates were removed, the mapper system was rewritten from scratch around a trait-based design, a new geometry DSL replaces all hardcoded offsets, and barcode correction was folded into the map step. Flex-V2 (384-plex) is now fully supported alongside V1.

New Features

Geometry DSL

A domain-specific language for specifying read geometry, replacing all hardcoded offsets and scattered CLI flags (-B, -u, --spacer, --offset, --lookback).

  • Components: [barcode], [umi:N], [probe], [anchor], [protospacer], [gex]
  • Skip regions: [:N] for anonymous spacers
  • Paired-end separator: | splits R1 from R2
  • Example: [barcode][umi:12] | [gex][:18][probe]
  • Geometry is resolved against loaded libraries to compute concrete byte offsets

Geometry Presets

Five built-in presets selectable via --preset:

Preset Geometry
gex-v1 [barcode][umi:12] | [gex][:18][probe]
gex-v2 [barcode][umi:12][:10][probe] | [gex]
crispr-v1 [barcode][umi:12] | [probe][anchor][protospacer]
crispr-v2 [barcode][umi:12][:10][probe] | [:14][anchor][protospacer]
crispr-proper [barcode][umi:12] | [:18][probe][anchor][protospacer]

V2 presets automatically set remap_window=5.

Flex-V2 (384-plex) Support

Full support for 10x Genomics Flex-V2 chemistry across both GEX and CRISPR workflows, including the new geometry presets and remap window handling.

Optional Probe Demultiplexing

Probes are now fully optional. Three modes are supported:

  • Probed (preset): --preset gex-v1 -p probes.txt — demux to per-probe IBU files
  • Probed (custom): --geometry "..." -p probes.txt
  • Unprobed: --geometry "[barcode][umi:12]|[gex]" without -p — writes a single IBU file

Validation catches mismatches (e.g., [probe] in geometry without a probe file).

Inline Barcode Correction

Barcode correction is now performed at map time via WhitelistMapper (backed by seqhash), eliminating the separate post-processing step. This simplifies the workflow pipeline from map → sort → barcode → sort → umi → sort → count to map → sort → umi-correct → count.

Minimum IBU Record Threshold

New --min-ibu-records flag (default 1000) automatically removes sparse IBU files after mapping, preventing downstream issues from near-empty outputs.

Probe Alias Regex Filtering

--probe-regex flag enables regex-based alias filtering on probe files during loading.

Improved Statistics

  • Per-library metadata: name, element count, hash count, position, mate, window, exact match mode, build time
  • Detailed unmapped breakdowns: missing probe, missing feature, missing whitelist, failed UMI quality, truncated UMI — all with fraction computation
  • Statistics written as three JSON files: mapping_lib.json, mapping_map.json, mapping_run.json

Architecture Changes

Crates Removed

  • cyto-core — All types were either inlined into consuming crates or rewritten. ~2,400 lines removed.
  • cyto-view — Standalone view command removed.
  • cyto-ibu-barcode-correct — Barcode correction folded into the map step.

Mapper Rewrite

The entire mapper system was rewritten around a Mapper trait with typestate resolution (UnpositionedReady). Individual mappers:

  • WhitelistMapper — cell barcode matching via seqhash::SeqHash
  • GexMapper — gene expression via seqhash::SplitSeqHash with agreed-index resolution
  • CrisprMapper — two-step anchor + protospacer matching via seqhash::MultiLenSeqHash
  • ProbeMapper — probe demultiplexing via seqhash::SeqHash
  • UmiMapper — UMI extraction with quality thresholding and 2-bit encoding

Core sequence mapping functionality has moved from disambiseq to seqhash

Unified Processor

A single generic MapProcessor<M: Mapper> replaces separate probed/unprobed implementations. It implements both binseq::ParallelProcessor and paraseq::PairedParallelProcessor with thread-local statistics accumulation.

Relocated Types

  • FeatureWriter trait: cyto-core::iocyto-io::feature
  • BarcodeIndexCount[s] and deduplicate_umis: cyto-corecyto-ibu-count::dedup

CLI Changes

Renamed/Removed Flags

  • --no-remap--remap-window
  • Removed: -B (barcode length), -u (UMI length), --spacer, --offset, --lookback, -H/--with-header
  • Removed from workflow: --skip-barcode, --bc-exact, --skip-bc-second-pass (second pass was removed)
  • -w/--whitelist is now required at map time (moved from workflow-level)

Default Thread Count

Default threads changed from 8 to 0 (auto-detect all available cores).

Moved CLI Struct

Cli struct moved from cyto-cli to the main cyto crate to keep the version in sync with the binary.

Bug Fixes

  • Fixed incorrect unmapped total calculation
  • Fixed CRISPR-V2 remap window not being applied
  • Fixed crispr-v2 preset not being exposed to the CLI
  • Added validation that BINSEQ inputs are paired (early error on unpaired files)
  • Better error messages for missing BINSEQ files and empty IBU outputs

CI

  • Added 4 new integration test jobs for unprobed workflows (GEX + CRISPR, map + workflow)
  • Expanded format matrix to include vbq and cbq alongside fastq and bq
  • All tests now include whitelist and preset flags

Dependencies

  • Added: seqhash 0.1.5, regex, derive_more, serde/serde_json/csv (to cyto-map)
  • Bumped: bitnuc 0.4.0 → 0.4.1, pycyto 0.1.13 → 0.1.14
  • Removed: all cyto-core dependencies across the workspace

What's Changed

Read more