Skip to content

Conversation

@ggevay
Copy link
Contributor

@ggevay ggevay commented Dec 14, 2025

This is another preparatory PR before adding statement logging to the frontend peek sequencing (#34305). This one is just adding some new tests.

The situation is that we won't be able to avoid fixing https://github.com/MaterializeInc/database-issues/issues/7304 any longer, because this will happen more often with the frontend peek sequencing (due to awaiting more on the frontend task, and thus having more time for the future to get dropped). However, the only viable way that I can see for fixing https://github.com/MaterializeInc/database-issues/issues/7304 will make us lose some test coverage: My plan to fix it is to remove the assertion in ExecuteContextExtra::drop, and instead make it mark the statement finished in the statement log. This means that if we have some code somewhere that forgets to mark a statement finished, then we would no longer have that assert firing.

So, what this PR does is to add tests that explicitly check that all statements get into a finished state in the statement log. And then when my next PR adds statement logging to the frontend peek sequencing and removes the assertion, these tests will keep testing for the finished states. (In the current PR, the new tests only check that the finished_at and finished_status fields are not null, but after the assertion is removed, we'll have a new possible state in finished_status, so I'll modify these tests to also alert on that new state.)

There are 3 commits:

  1. Does the check after every .td file that is run naked, i.e., not when Testdrive is called from Cluster tests and other things. See the doc comment of check_statement_logging for reasoning.
  2. Does the check in the Rust tests that involve statement logging.
  3. Does the check in statement-logging.td. This is not covered by 1., because this .td is run as a Cluster test.

The new tests take a few seconds to run after every .td file (due to waiting for the 5-sec buffering of writing to the statement log), so I made it configurable whether Testdrive does these checks, and made it do it only in one job of Nightly, which seemed to be the fastest one. Let me know if this is ok like this.

Nightly, running just the Testdrives: https://buildkite.com/materialize/nightly/builds/14493

@ggevay ggevay added the T-testing Theme: tests or test infrastructure label Dec 14, 2025
@ggevay ggevay marked this pull request as ready for review December 14, 2025 18:33
@ggevay ggevay requested review from a team as code owners December 14, 2025 18:33
@ggevay ggevay requested review from aljoscha and def- December 14, 2025 18:33
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have many other tests using testdrive internally, which don't benefit from the check-statement-logging. I triggered a test run to see how it would look if we enabled the --check-statement-logging in all tests:
https://buildkite.com/materialize/test/builds/113436
https://buildkite.com/materialize/nightly/builds/14501
Edit: Looks good so far!

- ./ci/plugins/mzcompose:
composition: testdrive
args: [--default-size=1, --slow]
args: [--default-size=1, --slow, --check-statement-logging]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only in this test and not in all testdrive tests?

#[clap(long, default_value_t = ConsistencyCheckLevel::default(), value_enum)]
consistency_checks: ConsistencyCheckLevel,
/// Whether to run statement logging consistency checks (adds a few seconds at the end of every
/// test file).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, doesn't hurt in Nightly though! Does it matter how the clusters are configured? I guess not, so maybe in just one test is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems these different Testdrive jobs differ in things that don't really matter for this consistency check, so I thought I won't slow down all of them, only the fastest one.

@ggevay
Copy link
Contributor Author

ggevay commented Dec 15, 2025

We also have many other tests using testdrive internally, which don't benefit from the check-statement-logging. I triggered a test run to see how it would look if we enabled the --check-statement-logging in all tests:

Thank you! I see some timeouts and other issues, but it's not immediately obvious to me whether they can be related. Do the backup tests use Testdrive?

But actually, I'm a bit afraid of making other tests just slightly flaky. For example, if a test restarts the system (e.g. by killing envd), then there might be rare flakes for this statement logging check, as mentioned in the doc comment of check_statement_logging. So even if one or two Nightlies would pass, I'm not sure that I'd like to turn this on in all tests. (The "pure" Testdrive tests, where the PR added --check-statement-logging, don't do system restarts, right?)

@def-
Copy link
Contributor

def- commented Dec 15, 2025

I'm not suggesting to turn it on in all tests! I think the timeouts are related, as this makes every test slower. I just wanted a one-off run to see if anything else falls over atm.

@ggevay
Copy link
Contributor Author

ggevay commented Dec 15, 2025

I see, ok, thank you!

@ggevay ggevay merged commit 3fc3f17 into MaterializeInc:main Dec 16, 2025
143 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-testing Theme: tests or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants