Skip to content

Commit 4429f4b

Browse files
authored
feat(baml_language): Implement BAML V2 lexer with complete operator support (#2686)
Implemented a lossless, incremental lexer for BAML V2 that produces structural tokens only, leaving keyword detection to the parser. Changes: - Replace keyword tokens (Function, Class, Enum, etc.) with single Word token - Add raw string support with multiple delimiter levels (#"..."#, ##"..."##, etc.) - Add comprehensive operator support: * Arithmetic: +, -, *, /, %, ++, -- * Assignment: =, +=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>= * Comparison: ==, !=, <, >, <=, >= * Logical: &&, ||, ! * Bitwise: &, |, ^, ~, <<, >> * Other: ::, ;, . - Add documentation for unquoted string handling (parser responsibility) - Update SyntaxKind enum with all new token types - Update parser to map all new tokens correctly - Add comprehensive test coverage (20 lexer tests) Design decisions: - Lexer is "dumb" - only identifies token boundaries, not semantic meaning - Parser checks Word token text to determine if it's a keyword - Raw strings are opaque - Jinja/template parsing happens in parser layer - Lossless tokenization preserved for perfect source reconstruction - Operator ordering ensures longer tokens match first (e.g., <<= before << before <) All tests passing: - 20 lexer unit tests - 20 integration tests - Updated snapshots to reflect new token structure <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces a lossless structural lexer with raw-string and comprehensive operator support, updates parser/syntax mappings, refreshes tests/snapshots, and wires project/workspace APIs. > > - **Lexer (baml_lexer)**: > - Replace keyword tokens with `Word`; add `Quote`/`Hash` and raw-string handling strategy. > - Add comprehensive operators/punctuation (assignment, comparison, logical, bitwise, shift, arithmetic) and preserve whitespace/comments. > - Document unquoted/raw string behavior; add extensive unit tests. > - **Parser & Syntax (baml_parser, baml_syntax)**: > - Map all new `TokenKind` variants to `SyntaxKind`. > - Extend `SyntaxKind` with `WORD`, `QUOTE`, `HASH`, new operators, and punctuation. > - **DB/Workspace (baml_db, baml_workspace)**: > - `RootDatabase::add_file` now creates `SourceFile` with `FileId`; add `set_project_root`; re-export workspace. > - Add file discovery (`discover_baml_files`) with tests. > - **Tests/Infra (baml_tests)**: > - Update generated tests and snapshots to new tokenization. > - Add benchmark generation scaffolding for incremental/scale benches. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 64982ce. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 514afc9 commit 4429f4b

File tree

16 files changed

+704
-128
lines changed

16 files changed

+704
-128
lines changed

baml_language/crates/baml_base/src/core_types.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
//! Core types used throughout the compiler.
22
3-
use smol_str::SmolStr;
43
use std::fmt;
4+
5+
use smol_str::SmolStr;
56
use text_size::{TextRange, TextSize};
67

78
/// Unique identifier for a source file

baml_language/crates/baml_base/src/files.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
//!
33
//! Defines the core structures for accessing file contents and paths.
44
5-
use crate::FileId;
65
use std::path::PathBuf;
76

7+
use crate::FileId;
8+
89
/// Input structure representing a source file in the compilation.
910
///
1011
/// This is a salsa input, which means it's the primary way to provide

baml_language/crates/baml_db/src/lib.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@
33
//! This crate purely combines all the compiler traits into a single database.
44
//! All testing happens in the separate `baml_tests` crate.
55
6-
use std::path::PathBuf;
7-
use std::sync::Arc;
8-
use std::sync::atomic::AtomicU32;
6+
use std::{
7+
path::PathBuf,
8+
sync::{Arc, atomic::AtomicU32},
9+
};
910

1011
// Re-export all public APIs
1112
pub use baml_base::*;

baml_language/crates/baml_diagnostics/src/lib.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@
33
//! This crate converts compiler diagnostics into beautiful error messages.
44
//! It doesn't define error types - those live in each compiler phase.
55
6+
use std::collections::HashMap;
7+
68
use ariadne::{Color, Label, Report, ReportKind, Source};
79
use baml_base::{Diagnostic, FileId, Severity, Span};
8-
use std::collections::HashMap;
910

1011
/// Convert a compiler diagnostic to Ariadne span format.
1112
fn span_to_ariadne(span: Span) -> (usize, std::ops::Range<usize>) {
@@ -72,10 +73,11 @@ pub fn render_diagnostics(
7273

7374
#[cfg(test)]
7475
mod tests {
75-
use super::*;
7676
use baml_base::{FileId, Span};
7777
use text_size::{TextRange, TextSize};
7878

79+
use super::*;
80+
7981
// A simple test diagnostic
8082
#[derive(Debug)]
8183
struct TestError {

0 commit comments

Comments
 (0)