Derive pagination LIMIT/OFFSET from a SQL parser#286
Conversation
The grid pagination rewriter read the user's LIMIT/OFFSET with a hand-rolled token scanner that only recognised the trailing `LIMIT <n> OFFSET <n>` shape. It mis-handled MySQL's `LIMIT <offset>, <count>`, `OFFSET` before `LIMIT`, and numeric expressions, producing wrong values or a malformed appended query. Parse the query with sqlparser using the driver's dialect and read LIMIT/OFFSET from the AST instead, normalising the MySQL comma form. The trailing clause is stripped at its keyword (consistent with what the parser saw) and the new pagination clause is rendered from a LimitClause AST node, then concatenated to the verbatim sliced base so leading comments, inline hints, and formatting are preserved. The token scanner is kept as a fallback for inputs the parser rejects, so behaviour never regresses. FETCH FIRST ... ROWS is out of scope and defers to the fallback. Builds on #275.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (21 files)
Reviewed by kimi-k2.6-20260420 · 174,393 tokens |
NewtTheWolf
left a comment
There was a problem hiding this comment.
Validated the parser-based rewriter end-to-end against a seeded perf_demo (50k rows, MySQL + Postgres). The headline cases are solid — MySQL LIMIT <off>, <cnt>, OFFSET before LIMIT, backtick identifiers and inline leading hints all paginate correctly across pages. 👍
But I hit four issues where the rewriter mishandles inputs — three confirmed directly via the new "executed query" the UI shows. Requesting changes for the two correctness ones (CTE + the strip desync); the FETCH item is smaller.
Confirmed at runtime (executed query in brackets):
-
CTEs are never paginated.
parse_paginationparses the whole statement and matchesStatement::Query, which already includesWITH … SELECT. But the call-site gate is stillis_select_query()— astarts_with("SELECT")string check — so a CTE falls through to the non-paginated branch and is hard-truncated atpage_sizewith no paging controls. The new parser could handle it; the gate blocks it. (all three drivers) -
strip_at_limit_keyworddesyncs from the parser for non-numeric clauses. When the parser reportshas_limit_clause = truebut the clause isn't plain digits, both the token heuristic and thestrip_limit_offsetfallback fail to strip it, so pagination is appended to the un-stripped query:… LIMIT ALL→… LIMIT ALL LIMIT 501 OFFSET 0→syntax error at or near "LIMIT"… LIMIT 2000 -- note→… LIMIT 2000 -- note LIMIT 501 OFFSET 0→ the--comments out the appended pagination, which is then silently dropped (the same query without the comment strips cleanly to… LIMIT 501 OFFSET 0).
Root cause: once the parser has recognized the clause, stripping re-tokenizes heuristically instead of cutting at the span the parser already computed.
-
FETCH FIRST … ROWSfallback produces a mixed clause.… FETCH FIRST 5000 ROWS ONLY→… FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0→ DB error. Out of scope per the PR — but the inline comment claims it defers "rather than producing a mixed clause", and it does produce one. (suggestion inline)
Recommendations
- (1) Derive paginatability from the parse, not the string prefix: paginate iff the query parses to a single
Statement::Query— cleanly includes CTEs/VALUES, excludesSHOW/EXPLAIN/DDL. (Don't widen toreturns_result_set, which would try to paginateSHOW/EXPLAINand append an invalidLIMIT.) - (2) When
has_limit_clauseis true, cut at the clause's source span from the parse instead of re-scanning tokens, soLIMIT ALL, trailing comments, etc. can't desync.
Details inline.
| /// keyword regardless of the value shape, so MySQL's `LIMIT <offset>, <count>` | ||
| /// is handled. Falls back to [`strip_limit_offset`] if no trailing clause is | ||
| /// found (kept total; the parser having reported a clause makes this rare). | ||
| fn strip_at_limit_keyword(query: &str) -> String { |
There was a problem hiding this comment.
Findings 2 & 4 — strip desync. When the parser reported a clause (has_limit_clause = true) but it isn't plain numeric, this keyword scan and the strip_limit_offset fallback both fail to strip it, and build_paginated_query then appends pagination to the un-stripped query. Confirmed at runtime:
… LIMIT ALL→ executed… LIMIT ALL LIMIT 501 OFFSET 0→syntax error at or near "LIMIT"… LIMIT 2000 -- note→ executed… LIMIT 2000 -- note LIMIT 501 OFFSET 0→ the--comments out the appendedLIMIT 501 OFFSET 0, so pagination is silently dropped (the same query without the comment strips cleanly to… LIMIT 501 OFFSET 0).
is_pagination_tail_token doesn't cover ALL / comment tokens, and the fallback only strips trailing numeric clauses — so any clause the parser accepts but these heuristics don't recognize desyncs. Since the parser already located the clause, consider cutting at its AST source span when has_limit_clause is true, instead of re-tokenizing.
There was a problem hiding this comment.
Fixed in d96ec81, taking your span suggestion. When the parser reports a clause, the strip now cuts at the clause's AST source span instead of re-tokenizing — with a walk-back over the leading keyword, since sqlparser's spans anchor at the clause's first value (LIMIT/OFFSET belong to no AST node).
Two wrinkles surfaced while wiring it up:
- A bare
LIMIT ALLnever reaches the AST at all —parse_optional_limit_clausefolds it into "no limit clause" (sqlparser 0.62, parser/mod.rs:13104) — so there is no span to anchor to. It's located by a trailing-token check instead.LIMIT ALL OFFSET ndoes produce a clause and anchors to the OFFSET value, peeling the danglingLIMIT ALLon the way back. - Queries with locking clauses defer to the fallback (like FETCH), because cutting span-to-end would silently drop
FOR UPDATEalong with the clause.
The heuristic tokenizer also skips --//* */ comments now, so the parse-error fallback path can't be shielded by a trailing comment either. Both runtime cases from the review are covered by tests, plus LIMIT ALL, LIMIT ALL OFFSET n, a trailing block comment, and a non-literal LIMIT 10 + 5.
| // FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path so | ||
| // its behaviour is unchanged rather than producing a mixed clause. |
There was a problem hiding this comment.
Confirmed at runtime: … FETCH FIRST 5000 ROWS ONLY paginates to … FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0 (DB error) — so the fallback does produce a mixed clause. Handling FETCH properly is out of scope, but the comment is misleading; fixing it to match:
| // FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path so | |
| // its behaviour is unchanged rather than producing a mixed clause. | |
| // FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path. | |
| // NB: the fallback does not strip FETCH, so a paginated FETCH query gets a | |
| // trailing LIMIT/OFFSET appended (a mixed clause the DB rejects). |
There was a problem hiding this comment.
Applied your suggested wording in d96ec81 — the comment now states the fallback does append a mixed clause for FETCH queries.
| let (final_query, pagination_meta) = if is_select && limit.is_some() { | ||
| let l = limit.unwrap(); | ||
| let data_query = crate::drivers::common::build_paginated_query(query, l, page); | ||
| let data_query = crate::drivers::common::build_paginated_query( |
There was a problem hiding this comment.
Finding 1 — CTEs are never paginated. This new build_paginated_query call is only reached when is_select (line 811) is true, and is_select = is_select_query() = starts_with("SELECT"). So a WITH … SELECT CTE never reaches the parser path you just added — it falls through to the non-paginated branch and is hard-truncated at page_size (confirmed at runtime: no AUTO-PAGINATED badge, can't reach page 2).
parse_pagination already matches Statement::Query, which includes CTEs — so the new parser would handle them; only the string-prefix gate blocks it. Suggest deriving the gate from the parse (Statement::Query) instead of starts_with("SELECT"). Same gate in mysql/mod.rs:948 and the sqlite driver.
There was a problem hiding this comment.
Fixed in d96ec81. The gate is now is_paginatable_query(): paginate iff the query parses to a single Statement::Query — CTEs and VALUES get in, SHOW/EXPLAIN/DDL stay out (per your note, it does not widen to returns_result_set). is_select_query() survives only as the fallback when the parser rejects the input, so dialect quirks sqlparser can't parse keep their old prefix-based routing. All three drivers migrated (postgres, mysql, sqlite).
… span Addresses review feedback on #286: - Paginate iff the query parses to a single Statement::Query instead of a starts_with("SELECT") check, so CTEs (WITH ... SELECT) and VALUES reach the parser-based rewriter while SHOW/EXPLAIN/DDL stay out. The prefix check remains as fallback when the parser rejects the input. - Strip a parsed LIMIT/OFFSET clause by cutting at its AST source span rather than re-scanning tokens, so clauses the heuristics don't recognise (LIMIT ALL, trailing comments, non-literal expressions) can no longer desync and produce mixed clauses or silently dropped pagination. A bare LIMIT ALL is folded into "no clause" by sqlparser and is located textually; queries with locking clauses defer to the fallback so FOR UPDATE is never silently dropped. - Make the heuristic tokenizer comment-aware so the fallback strip and extract scans cannot be shielded by trailing comments either. - Fix the misleading FETCH comment: the fallback does produce a mixed clause for FETCH FIRST queries.
Builds on #275.
Background
PR #275 fixed the grid pagination rewriter to fold the user's
OFFSETinto theper-page offset, but it did so by extending the hand-rolled token scanner in
src-tauri/src/drivers/common/query.rs, which only recognises the trailing… LIMIT <n> OFFSET <n>shape.That heuristic is fragile. It does not understand:
LIMIT <offset>, <count>syntaxOFFSETbeforeLIMIT(valid in Postgres)In those cases the values are read wrong (or dropped) and the stripped base can be
inconsistent with what was extracted, producing a malformed appended query.
The change
sqlparsercrate (v0.62) and readQuery.limit_clausefrom the AST,parsed with the correct per-driver dialect (
PaginationDialect::{MySql,Postgres,Sqlite}threaded through
build_paginated_query). MySQL'sLIMIT <offset>, <count>isnormalised to the same
(limit, offset)shape.LIMIT/OFFSETkeyword (reusing the existingposition-aware tokenizer, which collapses parenthesised subqueries so inner
LIMITs are never touched), consistent with what the parser saw.LimitClauseAST node and concatenate itto the verbatim sliced base, so leading comments, inline
/*+ hints */, andthe body's formatting are preserved (no full-query reserialization).
so behaviour never regresses.
FETCH FIRST … ROWSis out of scope and defers tothe fallback.
Tests
Existing
build_paginated_querytests updated to pass a dialect; new cases added forthe MySQL comma form (pages 1 & 2),
OFFSETbeforeLIMIT, backtick identifiers,inline-hint preservation, and the parse-error fallback.
cargo test drivers::common— 63 passed.