Skip to content

refactor: copy 0.14.1 parser from zig std#330

Merged
DonIsaac merged 3 commits intomainfrom
don/refactor/copy-zig-0.14-parser
Nov 29, 2025
Merged

refactor: copy 0.14.1 parser from zig std#330
DonIsaac merged 3 commits intomainfrom
don/refactor/copy-zig-0.14-parser

Conversation

@DonIsaac
Copy link
Owner

@DonIsaac DonIsaac commented Nov 29, 2025

Part of #323

This PR copies over the source code for Ast, Parse, etc. from zig 0.14.1's standard library. Migrating to 0.15 is a large lift since both writers and the parser both got re-written. Copying this over will let us break the migration into smaller pieces.

Summary by CodeRabbit

  • Refactor

    • Modernized compatibility to use a versioned Zig compatibility bundle (v0.14.1), centralizing standard pieces.
  • New Features

    • Full-featured tokenizer covering Zig syntax, operators, keywords, and numeric formats.
    • Robust string/char literal parsing with escape and unicode handling.
    • Primitive-name detection utility.
  • Chores

    • Removed an internal printer variant and cleaned up deprecated references.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 29, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a versioned Zig bundle at src/zig.zig (@"0.14.1") with tokenizer, string literal, and primitives modules; updates ~25 modules to source Ast/Token/Tokenizer/Loc from that bundle instead of std.zig; adds full tokenizer/string-literal implementations; removes popNoIndent from the printer API.

Changes

Cohort / File(s) Summary
Versioned Zig wrapper & root export
src/zig.zig, src/root.zig
New public @"0.14.1" struct bundling Ast, Parse, Token, Tokenizer, primitives, string_literal; root.zig now exports zig.
Zig 0.14.1 utility modules
src/zig/0.14.1/tokenizer.zig, src/zig/0.14.1/string_literal.zig, src/zig/0.14.1/primitives.zig, src/zig/0.14.1/LICENSE
Adds a full tokenizer (Token/Tag/Tokenizer API + tests), a string/char literal parser with detailed errors, a primitive-name checker, and MIT license file.
Semantic modules
src/Semantic.zig, src/Semantic/Parse.zig, src/Semantic/ast.zig, src/Semantic/tokenizer.zig
Switched Ast/Token/Tokenizer imports to zig.zig@"0.14.1"; ast.zig exposes NodeIndex and MaybeTokenId.
Linter rules & context
src/linter/lint_context.zig, src/linter/rules/*
src/linter/rules/allocator_first_param.zig, .../avoid_as.zig, .../case_convention.zig, .../duplicate_case.zig, .../homeless_try.zig, .../must_return_ref.zig, .../no_catch_return.zig, .../no_print.zig, .../no_return_try.zig, .../no_unresolved.zig, .../returned_stack_reference.zig, .../suppressed_errors.zig, .../unsafe_undefined.zig, .../useless_error_return.zig
Updated many linter files to source Ast, Token, Loc from the versioned zig bundle instead of std.zig.
Printer & visitor modules
src/printer/AstPrinter.zig, src/printer/SemanticPrinter.zig, src/printer/Printer.zig, src/visit/walk.zig, src/visit/walk_test.zig
Switched Node/Ast imports to zig.zig@"0.14.1"; Printer.zig removed pub fn popNoIndent(self: *Printer) void and consolidated indentation behavior into pop.
Span & source updates
src/span.zig, src/source.zig
span.zig now uses zig.Ast.Span and zig.Token.Loc; source.zig had a removed commented line only.
Tests & tooling config
.typos.toml, src/visit/walk_test.zig
Added tokenizer path to typos exclude; updated tests to reference zig.Ast where applicable.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Areas to focus review on:

  • src/zig/0.14.1/tokenizer.zig — large state machine, many lexical edge cases and tests.
  • src/zig/0.14.1/string_literal.zig — escape handling, Unicode/offset reporting, allocator paths.
  • Consistency across many linter rule files for the new zig import (ensure correct relative import paths and type aliases).
  • src/printer/Printer.zig — removal of popNoIndent and changed indentation/emit ordering.

Possibly related PRs

Poem

🐰 I bundled Zig bits in my cozy den,
Tokens, strings, and primitives — all in one den.
I hopped through modules, changed imports with care,
Now parsers and printers find home everywhere.
Hop hop, the codebase smells like fresh clover and air.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: copying the 0.14.1 parser from Zig's standard library into the repository, which is reflected across the comprehensive refactoring of 30+ files to use versioned zig imports.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2f3939 and 37cbbc4.

📒 Files selected for processing (1)
  • .typos.toml (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added A-semantic Area - semantic analysis A-linter Area - linter and lint rules labels Nov 29, 2025
@DonIsaac DonIsaac marked this pull request as ready for review November 29, 2025 01:21
@codecov
Copy link

codecov bot commented Nov 29, 2025

Codecov Report

❌ Patch coverage is 98.05014% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.96%. Comparing base (cf52a34) to head (37cbbc4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/zig/0.14.1/tokenizer.zig 98.04% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
- Coverage   91.67%   89.96%   -1.72%     
==========================================
  Files          91       94       +3     
  Lines        7808    11478    +3670     
==========================================
+ Hits         7158    10326    +3168     
- Misses        650     1152     +502     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/linter/rules/useless_error_return.zig (1)

334-336: Bug: Incorrect method call passes self twice.

The method call self.visit_switch_case_one(self, node) incorrectly passes self as an explicit argument when it's already bound via the method call syntax.

Apply this diff to fix the bug:

     pub fn visit_switch_case_one_inline(self: *Visitor, node: Node.Index) VisitError!walk.WalkState {
-        return self.visit_switch_case_one(self, node);
+        return self.visit_switch_case_one(node);
     }
src/linter/rules/case_convention.zig (1)

150-155: <!-- [scratchpad_start]
[task verify_review_comment_1/1]
[observations]

  • Symbol has a field decl: Node.Index (line 37) - this is the AST node index
  • Symbol.Id is a NominalId(u32) (line 53) - this is a symbol's unique identifier
  • Symbol table and AST nodes are two separate data structures
  • When code does symbols.items(.decl)[id], it's looking up the node index from the symbol table using symbol id
  • The critical point: symbol indices and node indices are DIFFERENT and stored in different structures

[analysis]

  • The bug is definitively confirmed
  • id (symbol index) and decl (node index) are fundamentally different
  • They should NEVER be used interchangeably
  • Using id to index into ast.nodes is incorrect
  • The fix to use decl is correct

[done]

  • Verification complete: the review comment is valid
  • The issue is real and the suggested fix is correct
    </scratchpad_end] -->

Based on my verification, I can now provide the rewritten review comment:


Use declaration node index instead of symbol index for AST node lookup

At lines 152-153, you're indexing ast.nodes.items(.main_token) with the symbol-table index id, when you should use the declaration node index decl. Symbol indices and node indices are distinct and stored in separate data structures—using id risks pointing at the wrong node or going out of bounds, producing incorrect diagnostic spans.

The declaration node index is already available from line 134: const decl: Node.Index = symbols.items(.decl)[id].

-    const ast = ctx.ast();
-    const fn_keyword_token_idx: Ast.TokenIndex = ast.nodes.items(.main_token)[id];
+    const ast = ctx.ast();
+    const fn_keyword_token_idx: Ast.TokenIndex = ast.nodes.items(.main_token)[decl];
🧹 Nitpick comments (3)
src/linter/rules/duplicate_case.zig (1)

62-64: Consider consolidating Ast imports.

The file imports zig directly (line 53) and also gets Ast via Semantic (line 63). Since Semantic.Ast already re-exports from the versioned zig module, the direct zig import is only used for Loc at line 55. This works correctly but creates two paths to essentially the same types.

src/zig/0.14.1/tokenizer.zig (2)

341-344: Consider project linting rules for std.debug.print.

The dump function uses std.debug.print which is flagged by the project's linter. Since this is vendored code from Zig stdlib, you may want to either:

  1. Add a lint suppression comment if the project supports it
  2. Accept this as intentional for debugging purposes

404-411: Consider adding safety comments for undefined values.

The static analysis tool flags the undefined values for tag and end. While this is safe because both are guaranteed to be set before any return path, the project's linting rules require safety comments.

     pub fn next(self: *Tokenizer) Token {
         var result: Token = .{
-            .tag = undefined,
+            .tag = undefined, // Set in all state branches before return
             .loc = .{
                 .start = self.index,
-                .end = undefined,
+                .end = undefined, // Set at line 1105 before return
             },
         };
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf52a34 and e2f3939.

📒 Files selected for processing (32)
  • src/Semantic.zig (1 hunks)
  • src/Semantic/Parse.zig (1 hunks)
  • src/Semantic/ast.zig (1 hunks)
  • src/Semantic/tokenizer.zig (2 hunks)
  • src/linter/lint_context.zig (1 hunks)
  • src/linter/rules/allocator_first_param.zig (1 hunks)
  • src/linter/rules/avoid_as.zig (1 hunks)
  • src/linter/rules/case_convention.zig (1 hunks)
  • src/linter/rules/duplicate_case.zig (1 hunks)
  • src/linter/rules/homeless_try.zig (1 hunks)
  • src/linter/rules/must_return_ref.zig (1 hunks)
  • src/linter/rules/no_catch_return.zig (1 hunks)
  • src/linter/rules/no_print.zig (1 hunks)
  • src/linter/rules/no_return_try.zig (1 hunks)
  • src/linter/rules/no_unresolved.zig (1 hunks)
  • src/linter/rules/returned_stack_reference.zig (1 hunks)
  • src/linter/rules/suppressed_errors.zig (1 hunks)
  • src/linter/rules/unsafe_undefined.zig (1 hunks)
  • src/linter/rules/useless_error_return.zig (1 hunks)
  • src/printer/AstPrinter.zig (1 hunks)
  • src/printer/Printer.zig (0 hunks)
  • src/printer/SemanticPrinter.zig (1 hunks)
  • src/root.zig (2 hunks)
  • src/source.zig (0 hunks)
  • src/span.zig (4 hunks)
  • src/visit/walk.zig (6 hunks)
  • src/visit/walk_test.zig (1 hunks)
  • src/zig.zig (1 hunks)
  • src/zig/0.14.1/LICENSE (1 hunks)
  • src/zig/0.14.1/primitives.zig (1 hunks)
  • src/zig/0.14.1/string_literal.zig (1 hunks)
  • src/zig/0.14.1/tokenizer.zig (1 hunks)
💤 Files with no reviewable changes (2)
  • src/source.zig
  • src/printer/Printer.zig
🧰 Additional context used
🪛 GitHub Actions: autofix.ci
src/zig/0.14.1/tokenizer.zig

[warning] 1462-1462: ba should be by, be.

🪛 GitHub Check: autofix
src/zig/0.14.1/tokenizer.zig

[warning] 1462-1462:
"ba" should be "by" or "be".

🪛 GitHub Check: Lint Changed
src/zig/0.14.1/tokenizer.zig

[warning] 409-409: unsafe-undefined
undefined is missing a safety comment


[warning] 406-406: unsafe-undefined
undefined is missing a safety comment


[warning] 343-343: no-print
Using std.debug.print is not allowed.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Collect and upload test coverage
🔇 Additional comments (39)
src/zig/0.14.1/LICENSE (1)

1-21: LGTM! Proper license attribution for copied Zig standard library code.

Including the MIT license file is the correct approach when copying code from the Zig standard library.

src/zig/0.14.1/primitives.zig (1)

1-64: LGTM! Well-structured primitive type detection.

The implementation correctly handles:

  • Named primitives via the static map
  • Integer types (i/u prefix + digits) via character checking
  • Edge cases like empty strings and underscores

The tests cover the key cases appropriately.

src/linter/rules/useless_error_return.zig (1)

69-70: Import change aligns with the versioned Zig module pattern.

The switch from std.zig.Ast to zig.Ast via the versioned module is consistent with the PR's goal.

src/linter/rules/avoid_as.zig (1)

41-46: Import change aligns with the versioned Zig module pattern.

The switch to the versioned zig.zig module is consistent with the rest of the PR.

src/linter/rules/homeless_try.zig (1)

69-70: Import change aligns with the versioned Zig module pattern.

Consistent with the PR's approach to centralize versioned Zig imports.

src/linter/rules/returned_stack_reference.zig (1)

57-58: Import change aligns with the versioned Zig module pattern.

Consistent with the PR's centralized versioned Zig imports approach.

src/linter/rules/allocator_first_param.zig (1)

56-58: Import change aligns with the versioned Zig module pattern.

Consistent with the PR's centralized versioned Zig imports approach.

src/visit/walk_test.zig (1)

3-5: LGTM!

The import migration to the versioned Zig module is correct and consistent with the PR's approach. The @"0.14.1" syntax properly accesses the versioned namespace from the centralized zig.zig module.

src/linter/rules/no_return_try.zig (1)

46-48: LGTM!

The versioned import migration is correct. The relative path ../../zig.zig properly resolves from the src/linter/rules/ directory.

src/linter/rules/must_return_ref.zig (1)

57-59: LGTM!

The import migration is consistent with the rest of the PR. The rule's functionality for detecting potentially leaked allocations via returned copies remains unchanged.

src/Semantic/ast.zig (1)

6-20: LGTM!

This is a well-structured centralization of AST type re-exports. The RawToken struct correctly mirrors the non-public token struct used in Zig's AST SOA, and the new NodeIndex and MaybeTokenId type aliases provide clean abstractions for dependent modules.

src/linter/rules/duplicate_case.zig (1)

53-55: Note the inconsistent AI summary.

The enriched summary references no-print.zig but this file is duplicate_case.zig. The import changes themselves are correct and consistent with the PR's migration pattern.

src/Semantic/Parse.zig (1)

45-48: Ast now sourced from zig.zig 0.14.1; wiring looks consistent

Redirecting Ast to @import("../zig.zig").@"0.14.1".Ast is consistent with the rest of the PR, and the existing uses (Ast.parse, ast.deinit, field type) remain valid as long as the copied 0.14.1 Ast API matches what Semantic exposes.

Please run the existing semantic/linter tests to confirm spans, token indices, and parse results are unchanged with the new Ast implementation.

src/linter/rules/no_unresolved.zig (1)

40-41: no-unresolved now using versioned Ast; no logic change

Binding Ast from zig.zig (0.14.1) keeps the rule’s use of Ast.Node.Tag and ctx.ast() consistent with the new semantic context wiring; no control-flow or diagnostic behavior changes here.

It’s still worth re-running this rule’s tests to ensure the new Ast definition doesn’t affect tag values or node indices.

src/linter/rules/no_catch_return.zig (1)

44-49: Swapping to zig.zig Ast/Token is consistent with semantic context

Using zig.Ast and zig.Token here aligns this rule with the rest of the refactor; the existing logic over Node.Tag, Token.Tag, and TokenIndex remains structurally identical.

Please confirm that Semantic’s ast.tokens is built from the same zig.Token type so the tag array indexing remains valid.

src/linter/lint_context.zig (1)

285-287: Central Ast alias now bound to zig.zig 0.14.1

Pointing Ast at ../zig.zig.@"0.14.1".Ast keeps the linter context in sync with the new parser bundle; ast(), spanN, spanT, and commentsBefore continue to operate on the same tree as Semantic.parse.ast.

Double-check that Semantic.parse.ast is also typed as this same Ast to avoid subtle pointer/ABI mismatches.

src/printer/AstPrinter.zig (1)

196-200: AstPrinter now targets versioned Ast; API usage remains valid

Rebinding Ast to zig.Ast matches the parser refactor, and all uses (rootDecls, fullVarDecl, fullCall, fullContainerDecl, token accessors) remain compatible with the 0.14.1 AST API.

Running the AST printer tests (or snapshot outputs) will help confirm there are no differences in tag names or slice boundaries with the new Ast source.

src/linter/rules/case_convention.zig (1)

39-41: case-convention hooked up to versioned Ast; behavior unchanged

Importing zig.zig 0.14.1 and aliasing Ast = zig.Ast keeps this rule aligned with the new parser types; Ast.TokenIndex, Ast.full.FnProto, and node access all stay on the same tree as the rest of the linter.

Please ensure LinterContext.ast() also returns this same Ast type so fullFnProto and token index math remain valid.

src/linter/rules/suppressed_errors.zig (1)

64-66: suppressed-errors now using versioned Ast; rule logic intact

Binding Ast from zig.zig (0.14.1) cleanly aligns this rule with the new parser; uses of Node, TokenIndex, nodeToSpan, token starts, and tags remain structurally the same.

Re-run this rule’s tests to confirm spans around catch bodies and unreachable tokens are unchanged with the new Ast implementation.

src/printer/SemanticPrinter.zig (1)

222-223: SemanticPrinter now keys off zig.zig Node; tag printing remains consistent

Switching Node to zig.Ast.Node keeps the PrintableReference.node field and tag extraction (ast.nodes.items(.tag)) aligned with the new 0.14.1 AST definitions without changing output structure.

Please re-run the semantic printer output tests (if any) to ensure node tag names and values still match expectations after the move to zig.Ast.Node.

src/Semantic/tokenizer.zig (1)

4-4: LGTM! Clean migration to versioned imports.

The updates to use the centralized zig.zig versioned module for Token and Tokenizer types are consistent and maintain the existing API surface.

Also applies to: 9-9, 58-58

src/linter/rules/no_print.zig (1)

70-72: LGTM! Consistent with versioned import strategy.

The Loc type alias now properly references the versioned Zig module.

src/root.zig (2)

3-3: LGTM! Good centralization of versioned imports.

Exporting the versioned Zig module as a public constant provides a clean access point for downstream modules.


24-25: LGTM! Test references ensure proper symbol tracking.

src/Semantic.zig (1)

146-147: LGTM! Public API now uses versioned Ast.

The shift from std.zig.Ast to the versioned zig.Ast improves version control and sets up the foundation for future migration to 0.15.

src/visit/walk.zig (1)

31-31: LGTM! Comprehensive update to versioned imports.

All references to AST types—in code, documentation, and tests—have been consistently updated to use the versioned zig.Ast. The mechanical nature of these changes maintains correctness.

Also applies to: 402-402, 1187-1190, 1231-1231, 1252-1252

src/span.zig (1)

3-3: LGTM! Type conversions updated for versioned module.

The updates to Span.from() and LabeledSpan.from() properly handle the versioned zig.Ast.Span and zig.Token.Loc types. Documentation is also kept in sync.

Also applies to: 65-66, 157-158, 231-231

src/linter/rules/unsafe_undefined.zig (1)

141-142: LGTM! Consistent with versioned import pattern.

src/zig.zig (1)

1-11: LGTM! Well-structured versioned module wrapper.

The centralized wrapper with explicit version namespacing (@"0.14.1") is a solid design that will facilitate the future migration to 0.15. The module cleanly exposes the key components (Ast, Parse, Token, Tokenizer, primitives, string_literal) needed by the rest of the codebase.

src/zig/0.14.1/string_literal.zig (6)

1-118: Well-structured error types and formatting.

The error union types (ParsedCharLiteral, Result, Error) follow Zig idioms nicely. The Error.fmt method provides good diagnostics with contextual error messages.


120-153: LGTM!

The parseCharLiteral function correctly handles UTF-8 codepoints and escape sequences with proper validation.


155-240: LGTM!

The escape sequence parsing handles all standard Zig escapes with proper validation and error reporting.


325-361: LGTM!

The parseWrite function correctly handles UTF-8 encoding for unicode escapes and rejects invalid embedded newlines.


363-373: LGTM!

Memory management is correct with defer buf.deinit() ensuring cleanup on all paths.


242-391: Good test coverage.

Tests comprehensively cover success cases (ASCII, UTF-8, hex escapes, unicode escapes) and error cases (invalid escapes, missing digits, malformed unicode).

src/zig/0.14.1/tokenizer.zig (4)

3-335: LGTM!

The Token struct provides a comprehensive and well-organized token model with proper keyword lookup via StaticStringMap.


1451-1488: The "0ba" test case is intentional, not a typo.

The pipeline warning about "ba" on line 1462 is a false positive. 0ba is intentionally testing how the tokenizer handles an invalid binary literal (0b prefix followed by the non-binary digit a). This is correct test coverage.


1714-1776: Excellent property-based test coverage.

The testPropertiesUpheld function verifies critical tokenizer invariants including location consistency, EOF handling, and control character restrictions.


412-1103: LGTM!

The tokenizer state machine is comprehensive and handles all Zig syntax correctly, including edge cases like UTF-8 BOM, CR/LF normalization, and various numeric literal formats.

@DonIsaac DonIsaac merged commit 48d0266 into main Nov 29, 2025
15 of 17 checks passed
@DonIsaac DonIsaac deleted the don/refactor/copy-zig-0.14-parser branch November 29, 2025 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-linter Area - linter and lint rules A-semantic Area - semantic analysis

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant