feat: Use upstream OverlapProvider and MergeProvider#310
Conversation
|
@claude pls review |
|
Claude finished @mwiewior's task in 1m 39s —— View job PR Review
Overall this is a clean, well-motivated refactor. Replacing the hand-crafted Python merge implementation and the SQL-generation overlap logic with upstream providers shrinks the codebase meaningfully while gaining new capabilities (file-path and LazyFrame inputs for merge). Below are the findings, ordered roughly by severity. Issues / Concerns1.
|
7d8ab1a to
0e3c52b
Compare
Replace hand-crafted SQL generation for overlap with the upstream OverlapProvider from datafusion-bio-function-ranges, and replace the ~160-line Python window-function merge implementation with the upstream MergeProvider routed through the Rust backend. This gains file-path and LazyFrame streaming support for merge (previously DataFrame-only) and removes ~260 lines of code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0e3c52b to
9f4ebc2
Compare
Summary
src/query.rswith the upstreamOverlapProviderfromdatafusion-bio-function-ranges(PR Add overlap and merge UDTVFs datafusion-bio-functions#16)MergeProviderrouted through the Rust backendRangeOp::Mergevariant andmin_distfield toRangeOptionsfor merge configurationoverlap_query()and unusedprojection_columnsfield fromQueryParamspyarrow,set_coordinate_system,get_coordinate_system,logger)Merge now gains file-path and LazyFrame streaming support (previously DataFrame-only). Net effect: ~260 lines removed.
Depends on: biodatageeks/datafusion-bio-functions#16 (update rev to main after merge)
Test plan
cargo checkpasses with no warningsmaturin develop --releasebuilds successfullytest_bioframe.py,test_overlap_algorithms.py,test_suffix_handling.py)datafusion-bio-function-rangesrev to main branch commit after upstream PR merges🤖 Generated with Claude Code