Skip to content

Conversation

@antiguru
Copy link
Member

In a quick test, this reduces the number of rows feeding into the output as follows:

  • Batches: 70%
  • Records: 26%

Tested with introspection debugging turned on.

Motivation

  • This PR adds a feature that has not yet been specified: Make the introspection dataflows as fast as possible.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

In a quick test, this reduces the number of rows feeding into the output
as follows:
* Batches: 70%
* Records: 26%

Tested with introspection debugging turned on.

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru marked this pull request as ready for review October 12, 2023 15:38
@antiguru antiguru requested a review from a team October 12, 2023 15:38
Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Minor comment below, totally optional (up to your preference!).

let arrangement_batches = batches
.as_collection()
.mz_arrange_core::<_, RowSpine<_, _, _, _>>(
Exchange::new(move |_| u64::cast_from(worker_id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this matters much, but the Exchanges at this position are merely defensive, right? In the previous version of the code, the data were just pipelined, so using a Pipeline pact here would also be an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, and thinking about it I'm not sure it makes a lot of sense! But I don't want to change it as part of this PR because there are other places that use the same construct, and it's better to change all at the same time.

The exchange would go away if/once #22384 lands because then we make it an invariant that log data gets processed locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is where we decided to keep these exchanges: #18062 (comment).

But I also find Pipeline more readable here, as it conveys what invariants we believe to be true. When seeing the exchange without an explanation you might start to doubt your understanding of how events flow through these dataflows.

@antiguru antiguru merged commit ff77df5 into MaterializeInc:main Oct 17, 2023
@antiguru antiguru deleted the logging_prearrange branch October 17, 2023 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants