feat: Add cluster, complement, and subtract range operations#311
feat: Add cluster, complement, and subtract range operations#311
Conversation
88147be to
bd62c71
Compare
b74cbec to
b069f28
Compare
Integrate three new bioframe-compatible interval operations from upstream datafusion-bio-functions PR #17: - cluster: Assign cluster IDs to overlapping/nearby intervals - complement: Compute gaps between intervals (with optional chromsizes view) - subtract: Remove overlapping portions of intervals from another set All operations support the full polars-bio pipeline: DataFrame, LazyFrame, pandas, file paths, and projection pushdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b069f28 to
6f22c04
Compare
Verifies that operations correctly handle DataFrames with extra columns beyond the core contig/pos_start/pos_end triplet. Covers overlap, nearest, count_overlaps, coverage, merge, cluster, complement, and subtract across pandas, polars DataFrame, and polars LazyFrame output types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f741b26 to
d96527f
Compare
…entation Update feature comparison table, API comparison table, coordinate system mermaid diagram, and algorithm description to include the newly implemented cluster, complement, and subtract operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d96527f to
4dac3b7
Compare
Verify and correct feature matrix against actual library APIs: - subtract: add PyRanges0, PyGenomics, GenomicRanges support - complement: add GenomicRanges support (via gaps()) - merge: add PyGenomics support (via iterate_reduced()) - count_overlaps: remove incorrect PyGenomics checkmark Add documentation links in API comparison table: - PyRanges1: cluster_overlaps, complement_ranges, subtract_overlaps, join_overlaps, nearest_ranges - PyRanges0: subtract - GenomicRanges: gaps (complement), subtract Add footnotes 6-9 with links for methods with different names (iterate_reduced, reduce, iterate_complement, gaps). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude pls review |
|
Claude finished @mwiewior's task in 2m 45s —— View job Code Review
Overall the implementation is clean, well-tested, and consistent with the existing codebase patterns. The new operations follow the established dispatch model ( A few findings are below — one is a meaningful bug risk, the rest are minor. 1.
|
Integrate upstream PR #21 (datafusion-bio-functions) which adds extra column preservation for ClusterProvider and SubtractProvider using a two-path design with FullBatchCollector + Arrow take kernel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace unreliable id()-based view table naming with monotonic counter (itertools.count) to prevent name collisions after GC (#1) - Add warning when complement() is called without view_df, since contigs will span [0, i64::MAX) which is rarely useful (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b339ea4 to
412b246
Compare
These range operations were added in #311 but were missing from the Polars LazyFrame .pb extension namespace (polars_ext.py). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
) These range operations were added in #311 but were missing from the Polars LazyFrame .pb extension namespace (polars_ext.py). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
cluster: Assign cluster IDs to overlapping/nearby intervals (single-table, likemerge)complement: Compute gaps between intervals with optional chromsizes view table (new pattern)subtract: Remove overlapping portions from one set of intervals using another (two-table, likeoverlap)Changes
Rust:
datafusion-bio-function-rangesupstream dependency to rev withClusterProvider,ComplementProvider,SubtractProviderSubtract = 5toRangeOpenum,view_table/view_columnsfields toRangeOptionsdo_cluster,do_complement,do_subtractdispatch functions inoperation.rsPython:
_generate_cluster_schema,_generate_complement_schema,_generate_subtract_schemacluster(),complement(),subtract()methods onIntervalOperations_register_view_table()helper for complement's chromsizes view registrationpb.cluster(),pb.complement(),pb.subtract()in public APITest plan
cargo checkpassesmaturin develop --releasebuilds successfullypython -m pytest tests/test_bioframe.py -v— all 16 tests pass (6 new)🤖 Generated with Claude Code