fix: r_cte wrong/flaky results under concurrency#19439
fix: r_cte wrong/flaky results under concurrency#19439KKould wants to merge 7 commits intodatabendlabs:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6e0c6ce82
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/query/service/src/pipelines/processors/transforms/transform_recursive_cte_source.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24cf459611
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if short.len() >= 32 { | ||
| break; |
There was a problem hiding this comment.
Preserve full query-id entropy in RCTE table prefix
make_rcte_prefix stops after collecting 32 alphanumeric characters, so two concurrent queries whose IDs share the same first 32 alnum chars will still map to the same internal __rcte_* table names. That reintroduces the same cross-query interference this patch is trying to eliminate (wrong/flaky recursive CTE results) for clients that provide custom/long query IDs. Generate the prefix from the full query ID (e.g., full sanitized ID or a hash of it) instead of truncating here.
Useful? React with 👍 / 👎.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
fixes: Tracking: Recursive CTE result may be random #19398
Fix flaky / non-deterministic results in WITH RECURSIVE ... UNION ALL ... caused by cross-query interference on recursive CTE internal MEMORY tables.
Make the recursive CTE internal table names query-unique (prefix with query id) so concurrent queries (sqllogictest parallelism in CI) can no longer create/drop/recreate the same internal table name and corrupt each other’s recursion state.
Add a deterministic regression test hook and a stable repro test that simulates the interference between step=0 and step=1.
Root Cause
Recursive CTE is executed in steps. In step=0, Databend creates internal Engine=Memory tables (one per RecursiveCteScan) and writes prepared blocks keyed by exec_id. In step>=1 it reads from the same internal MEMORY table to continue recursion.
Previously, those internal MEMORY tables were created in the current database using stable names taken directly from the recursive scan name / CTE alias (e.g. lines, paths). This makes the internal tables globally visible by (tenant, catalog, database, table_name) and not query-private.
In CI, sqllogictests frequently runs with --parallel > 1 against a single databend-query instance. Multiple concurrent sqllogictest queries can therefore execute recursive CTEs that share common aliases like lines. Because each recursive CTE run also drops its internal tables at the end, one
query can drop/recreate the same internal table name while another query is between step=0 and step=1, causing step=1 to see an empty/replaced table and terminate early (e.g. returning seed-only results such as 1 instead of the correct 1000). This appears as a rare, timing-dependent “random
result” in CI.
CI Execution Chain
Tests
Type of change
This change is