Skip to content

Derive pagination LIMIT/OFFSET from a SQL parser#286

Open
debba wants to merge 4 commits into
mainfrom
feat/pagination-sql-parser
Open

Derive pagination LIMIT/OFFSET from a SQL parser#286
debba wants to merge 4 commits into
mainfrom
feat/pagination-sql-parser

Conversation

@debba

@debba debba commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Builds on #275.

Background

PR #275 fixed the grid pagination rewriter to fold the user's OFFSET into the
per-page offset, but it did so by extending the hand-rolled token scanner in
src-tauri/src/drivers/common/query.rs, which only recognises the trailing
… LIMIT <n> OFFSET <n> shape.

That heuristic is fragile. It does not understand:

  • MySQL's LIMIT <offset>, <count> syntax
  • OFFSET before LIMIT (valid in Postgres)
  • numeric expressions / placeholders

In those cases the values are read wrong (or dropped) and the stripped base can be
inconsistent with what was extracted, producing a malformed appended query.

The change

  • Add the sqlparser crate (v0.62) and read Query.limit_clause from the AST,
    parsed with the correct per-driver dialect (PaginationDialect::{MySql,Postgres,Sqlite}
    threaded through build_paginated_query). MySQL's LIMIT <offset>, <count> is
    normalised to the same (limit, offset) shape.
  • Strip the trailing clause at its LIMIT/OFFSET keyword (reusing the existing
    position-aware tokenizer, which collapses parenthesised subqueries so inner
    LIMITs are never touched), consistent with what the parser saw.
  • Render the new pagination clause from a LimitClause AST node and concatenate it
    to the verbatim sliced base, so leading comments, inline /*+ hints */, and
    the body's formatting are preserved (no full-query reserialization).
  • Keep the original token scanner as a fallback for inputs the parser rejects,
    so behaviour never regresses. FETCH FIRST … ROWS is out of scope and defers to
    the fallback.

Tests

Existing build_paginated_query tests updated to pass a dialect; new cases added for
the MySQL comma form (pages 1 & 2), OFFSET before LIMIT, backtick identifiers,
inline-hint preservation, and the parse-error fallback. cargo test drivers::common
— 63 passed.

debba added 3 commits June 4, 2026 14:18
The grid pagination rewriter read the user's LIMIT/OFFSET with a
hand-rolled token scanner that only recognised the trailing
`LIMIT <n> OFFSET <n>` shape. It mis-handled MySQL's
`LIMIT <offset>, <count>`, `OFFSET` before `LIMIT`, and numeric
expressions, producing wrong values or a malformed appended query.

Parse the query with sqlparser using the driver's dialect and read
LIMIT/OFFSET from the AST instead, normalising the MySQL comma form.
The trailing clause is stripped at its keyword (consistent with what
the parser saw) and the new pagination clause is rendered from a
LimitClause AST node, then concatenated to the verbatim sliced base so
leading comments, inline hints, and formatting are preserved. The
token scanner is kept as a fallback for inputs the parser rejects, so
behaviour never regresses. FETCH FIRST ... ROWS is out of scope and
defers to the fallback.

Builds on #275.
@kilo-code-bot

kilo-code-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (21 files)
  • src-tauri/Cargo.lock - dependency additions (sqlparser, recursive, stacker, etc.) and version bump to 0.13.0
  • src-tauri/Cargo.toml - adds sqlparser = "0.62" dependency
  • src-tauri/src/drivers/common.rs - re-exports new symbols
  • src-tauri/src/drivers/common/query.rs - AST-based pagination rewriter with dialect-aware parsing and fallback heuristics
  • src-tauri/src/drivers/common/tests.rs - updated existing tests + new coverage for MySQL comma syntax, offset-before-limit, backticks, hints, and parse-error fallback
  • src-tauri/src/drivers/mysql/mod.rs - wires MySQL dialect and error annotation
  • src-tauri/src/drivers/postgres/mod.rs - wires Postgres dialect and error annotation
  • src-tauri/src/drivers/sqlite/mod.rs - wires SQLite dialect and error annotation
  • src/components/notebook/SqlCell.tsx - passes originalQuery to result component
  • src/components/notebook/SqlCellResult.tsx - accepts and forwards originalQuery
  • src/components/ui/ErrorDisplay.tsx - displays original/executed query via collapsible panels
  • src/components/ui/ResultEntryContent.tsx - passes entry.query to ErrorDisplay
  • src/pages/Editor.tsx - passes activeTab.query to ErrorDisplay
  • src/i18n/locales/{de,en,es,fr,it,ja,ru,zh}.json - adds show/hide query translation keys

Reviewed by kimi-k2.6-20260420 · 174,393 tokens

@debba debba requested a review from NewtTheWolf June 8, 2026 17:31

@NewtTheWolf NewtTheWolf left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated the parser-based rewriter end-to-end against a seeded perf_demo (50k rows, MySQL + Postgres). The headline cases are solid — MySQL LIMIT <off>, <cnt>, OFFSET before LIMIT, backtick identifiers and inline leading hints all paginate correctly across pages. 👍

But I hit four issues where the rewriter mishandles inputs — three confirmed directly via the new "executed query" the UI shows. Requesting changes for the two correctness ones (CTE + the strip desync); the FETCH item is smaller.

Confirmed at runtime (executed query in brackets):

  1. CTEs are never paginated. parse_pagination parses the whole statement and matches Statement::Query, which already includes WITH … SELECT. But the call-site gate is still is_select_query() — a starts_with("SELECT") string check — so a CTE falls through to the non-paginated branch and is hard-truncated at page_size with no paging controls. The new parser could handle it; the gate blocks it. (all three drivers)

  2. strip_at_limit_keyword desyncs from the parser for non-numeric clauses. When the parser reports has_limit_clause = true but the clause isn't plain digits, both the token heuristic and the strip_limit_offset fallback fail to strip it, so pagination is appended to the un-stripped query:

    • … LIMIT ALL… LIMIT ALL LIMIT 501 OFFSET 0syntax error at or near "LIMIT"
    • … LIMIT 2000 -- note… LIMIT 2000 -- note LIMIT 501 OFFSET 0 → the -- comments out the appended pagination, which is then silently dropped (the same query without the comment strips cleanly to … LIMIT 501 OFFSET 0).

    Root cause: once the parser has recognized the clause, stripping re-tokenizes heuristically instead of cutting at the span the parser already computed.

  3. FETCH FIRST … ROWS fallback produces a mixed clause. … FETCH FIRST 5000 ROWS ONLY… FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0 → DB error. Out of scope per the PR — but the inline comment claims it defers "rather than producing a mixed clause", and it does produce one. (suggestion inline)

Recommendations

  • (1) Derive paginatability from the parse, not the string prefix: paginate iff the query parses to a single Statement::Query — cleanly includes CTEs/VALUES, excludes SHOW/EXPLAIN/DDL. (Don't widen to returns_result_set, which would try to paginate SHOW/EXPLAIN and append an invalid LIMIT.)
  • (2) When has_limit_clause is true, cut at the clause's source span from the parse instead of re-scanning tokens, so LIMIT ALL, trailing comments, etc. can't desync.

Details inline.

/// keyword regardless of the value shape, so MySQL's `LIMIT <offset>, <count>`
/// is handled. Falls back to [`strip_limit_offset`] if no trailing clause is
/// found (kept total; the parser having reported a clause makes this rare).
fn strip_at_limit_keyword(query: &str) -> String {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings 2 & 4 — strip desync. When the parser reported a clause (has_limit_clause = true) but it isn't plain numeric, this keyword scan and the strip_limit_offset fallback both fail to strip it, and build_paginated_query then appends pagination to the un-stripped query. Confirmed at runtime:

  • … LIMIT ALL → executed … LIMIT ALL LIMIT 501 OFFSET 0syntax error at or near "LIMIT"
  • … LIMIT 2000 -- note → executed … LIMIT 2000 -- note LIMIT 501 OFFSET 0 → the -- comments out the appended LIMIT 501 OFFSET 0, so pagination is silently dropped (the same query without the comment strips cleanly to … LIMIT 501 OFFSET 0).

is_pagination_tail_token doesn't cover ALL / comment tokens, and the fallback only strips trailing numeric clauses — so any clause the parser accepts but these heuristics don't recognize desyncs. Since the parser already located the clause, consider cutting at its AST source span when has_limit_clause is true, instead of re-tokenizing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d96ec81, taking your span suggestion. When the parser reports a clause, the strip now cuts at the clause's AST source span instead of re-tokenizing — with a walk-back over the leading keyword, since sqlparser's spans anchor at the clause's first value (LIMIT/OFFSET belong to no AST node).

Two wrinkles surfaced while wiring it up:

  • A bare LIMIT ALL never reaches the AST at all — parse_optional_limit_clause folds it into "no limit clause" (sqlparser 0.62, parser/mod.rs:13104) — so there is no span to anchor to. It's located by a trailing-token check instead. LIMIT ALL OFFSET n does produce a clause and anchors to the OFFSET value, peeling the dangling LIMIT ALL on the way back.
  • Queries with locking clauses defer to the fallback (like FETCH), because cutting span-to-end would silently drop FOR UPDATE along with the clause.

The heuristic tokenizer also skips --//* */ comments now, so the parse-error fallback path can't be shielded by a trailing comment either. Both runtime cases from the review are covered by tests, plus LIMIT ALL, LIMIT ALL OFFSET n, a trailing block comment, and a non-literal LIMIT 10 + 5.

Comment thread src-tauri/src/drivers/common/query.rs Outdated
Comment on lines +348 to +349
// FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path so
// its behaviour is unchanged rather than producing a mixed clause.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed at runtime: … FETCH FIRST 5000 ROWS ONLY paginates to … FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0 (DB error) — so the fallback does produce a mixed clause. Handling FETCH properly is out of scope, but the comment is misleading; fixing it to match:

Suggested change
// FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path so
// its behaviour is unchanged rather than producing a mixed clause.
// FETCH FIRST … ROWS ONLY is out of scope; defer to the fallback path.
// NB: the fallback does not strip FETCH, so a paginated FETCH query gets a
// trailing LIMIT/OFFSET appended (a mixed clause the DB rejects).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggested wording in d96ec81 — the comment now states the fallback does append a mixed clause for FETCH queries.

let (final_query, pagination_meta) = if is_select && limit.is_some() {
let l = limit.unwrap();
let data_query = crate::drivers::common::build_paginated_query(query, l, page);
let data_query = crate::drivers::common::build_paginated_query(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 1 — CTEs are never paginated. This new build_paginated_query call is only reached when is_select (line 811) is true, and is_select = is_select_query() = starts_with("SELECT"). So a WITH … SELECT CTE never reaches the parser path you just added — it falls through to the non-paginated branch and is hard-truncated at page_size (confirmed at runtime: no AUTO-PAGINATED badge, can't reach page 2).

parse_pagination already matches Statement::Query, which includes CTEs — so the new parser would handle them; only the string-prefix gate blocks it. Suggest deriving the gate from the parse (Statement::Query) instead of starts_with("SELECT"). Same gate in mysql/mod.rs:948 and the sqlite driver.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d96ec81. The gate is now is_paginatable_query(): paginate iff the query parses to a single Statement::Query — CTEs and VALUES get in, SHOW/EXPLAIN/DDL stay out (per your note, it does not widen to returns_result_set). is_select_query() survives only as the fallback when the parser rejects the input, so dialect quirks sqlparser can't parse keep their old prefix-based routing. All three drivers migrated (postgres, mysql, sqlite).

… span

Addresses review feedback on #286:

- Paginate iff the query parses to a single Statement::Query instead of
  a starts_with("SELECT") check, so CTEs (WITH ... SELECT) and VALUES
  reach the parser-based rewriter while SHOW/EXPLAIN/DDL stay out. The
  prefix check remains as fallback when the parser rejects the input.
- Strip a parsed LIMIT/OFFSET clause by cutting at its AST source span
  rather than re-scanning tokens, so clauses the heuristics don't
  recognise (LIMIT ALL, trailing comments, non-literal expressions)
  can no longer desync and produce mixed clauses or silently dropped
  pagination. A bare LIMIT ALL is folded into "no clause" by sqlparser
  and is located textually; queries with locking clauses defer to the
  fallback so FOR UPDATE is never silently dropped.
- Make the heuristic tokenizer comment-aware so the fallback strip and
  extract scans cannot be shielded by trailing comments either.
- Fix the misleading FETCH comment: the fallback does produce a mixed
  clause for FETCH FIRST queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants