Skip to content

Columnar rendering: complete migration (Prompts 2.3–10.1)#179

Open
antiguru wants to merge 11 commits intocolumnar_renderingfrom
claude/complete-prompt-task-KOaiS
Open

Columnar rendering: complete migration (Prompts 2.3–10.1)#179
antiguru wants to merge 11 commits intocolumnar_renderingfrom
claude/complete-prompt-task-KOaiS

Conversation

@antiguru
Copy link
Copy Markdown
Owner

Summary

Completes the columnar rendering migration (Prompts 2.3 through 10.1), building on the foundation from the earlier PR (Prompts 0.1–2.2).

Changes by phase

  • Phase 2 (pass-through operators): Constant operator emits columnar collections
  • Phase 3 (MFP): Added as_columnar_collection_core method; wired Get::Collection and Mfp to prefer columnar path
  • Phase 4 (FlatMap): Accept columnar input, emit columnar output at boundaries
  • Phase 5 (ArrangeBy): ensure_collections handles columnar-only inputs for arrangement creation
  • Phase 6 (Stateful): Reduce, TopK accept columnar input; Threshold verified (arrangement-only, no changes needed)
  • Phase 7 (Joins): Linear join accepts columnar input; Delta join produces columnar output
  • Phase 8 (Sinks): Sink boundary converts columnar→Vec for persist
  • Phase 9 (Cleanup): Removed collection field from CollectionBundle entirely. Data flows exclusively through columnar_collection. from_collections auto-converts Vec→columnar. Added as_vec_collection() for on-demand conversion at operator boundaries. Net -50 lines.
  • Phase 10 (Research): Investigated columnar arrangement spines — feasible but requires schema propagation, new BatchContainer impls, and vectorized eval as prerequisites

Key architectural decisions

  • Columnar-first: All source operators (persist, constant) produce columnar-only bundles
  • On-demand Vec conversion: as_vec_collection() converts columnar→Vec at operator boundaries (arrangements, sinks, etc.)
  • from_collections auto-converts: Operators producing Vec output seamlessly convert to columnar
  • Error streams stay Vec: DataflowError is not suited for columnar layout

Skipped

Test plan

  • cargo check -p mz-compute passes
  • cargo check -p mz-compute --tests passes
  • cargo clippy -p mz-compute --all-targets — zero warnings
  • bin/fmt (rustfmt) passes
  • Unit tests in render/columnar.rs cover round-trip, negate, union, and constant conversions

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d

claude added 11 commits March 25, 2026 10:24
- Use Rc::clone(&results) instead of results.clone() for Rc pointers
- Replace vec![...] with array literal where Vec is unnecessary
- Replace Iterator::zip (disallowed) with direct assert_eq

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
…atches

New prompts 11.1–11.6 to eliminate columnar→Vec conversions by operating
directly on &RowRef from columnar containers. Key insight: DatumVec's
borrow_with already accepts &RowRef (the columnar Ref<'_, Row> type),
so operators can process columnar data without materializing owned Rows.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
…rompt 11.1)

Add a columnar path to CollectionBundle::flat_map that iterates the
columnar container via into_index_iter(), passing &RowRef directly to
borrow_with_limit. This eliminates the columnar→Vec conversion and
avoids allocating owned Row values. Only timestamps and diffs are
converted to owned (cheap scalar copies).

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
…tly (Prompt 11.2)

Add as_specific_columnar_collection that returns the columnar collection
without conversion when key is None. Optimize as_columnar_collection_core
to detect identity MFPs and skip the columnar→Vec→columnar round-trip.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
Identity MFPs return columnar directly (11.2). Non-identity MFPs
iterate columnar via &RowRef (11.1) but output is Vec-based due to
map_fallible Ok/Err split. Updated doc comment to reflect current state.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
render_reduce calls flat_map which now iterates &RowRef directly from
columnar containers (11.1). No Vec conversion needed for key/value
extraction. Verification only, no code changes.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
…t 11.5)

Add a columnar path to render_flat_map that uses unary_fallible directly
on Column<(Row, T, Diff)> containers. Iterates &RowRef without allocating
owned Rows for expression evaluation. Changed drain_through_mfp to accept
&RowRef. Vec fallback retained for arrangement key paths.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
…ctly (Prompt 11.6)

Add arrange_columnar_collection that takes ColumnarCollection and iterates
&RowRef from columnar containers for key/value expression evaluation,
avoiding the columnar→Vec conversion. Wire ensure_collections to use it
when identity MFP + no input_key + columnar available. The passthrough
stream stays columnar throughout the arrangement loop.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
Replace Columnar::into_owned with copy_from on reusable buffers in all
columnar iteration loops. This avoids allocating new Row/Timestamp/Diff
values each iteration, reusing the buffer's existing allocation instead.

Affected operators: ColumnarToVec, NegateColumnar, ColumnarFlatMap,
FlatMapStageColumnar, FormArrangementKeyColumnar.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
Change the flat_map closure signature from (DatumVecBorrow, T, Diff)
to (DatumVecBorrow, &T, &Diff). This eliminates unnecessary clones in
the columnar path (references to copy_from buffers are passed directly)
and in the arrangement path (owned values from buffer.drain are passed
by reference). Callers clone/copy only when they actually need ownership.

https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants