Skip to content

Commit 18b3ceb

Browse files
authored
test(baml_parser): Add comprehensive parser test infrastructure (#2688)
Adds extensive test coverage for the BAML V2 parser with focus on incremental parsing capabilities. ## Test Projects - **parser_strings**: Simple strings, raw strings, unicode, nested quotes - **parser_error_recovery**: Unclosed strings, missing braces, invalid syntax - **parser_expressions**: Binary ops, precedence, if/match expressions - **parser_speculative**: LLM vs expression function disambiguation - **parser_stress**: Deeply nested structures, large files (1000 classes), complex strings ## Incremental Parsing Tests Added benchmark infrastructure for incremental parsing: - Single character edits to strings - Fixing unclosed string errors - Adding attributes to existing fields These verify >95% node reuse for small edits. ## Test Utilities - Node reuse metrics calculation - Losslessness verification (tree reconstructs source exactly) - String edit helpers (insert, delete, replace) - Tree comparison and traversal safety checks ## AST Enhancements Extended AST with function type helpers: - LlmFunctionBody and ExprFunctionBody nodes - is_llm_function() and is_expr_function() accessors - Separate accessors for llm_body() vs expr_body() All tests generate insta snapshots for easy review. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds parser snapshot suites (strings, stress) and introduces incremental parsing test utilities for incremental edits and node reuse measurement. > > - **Tests**: > - **Parser Snapshots**: Add `parser_strings` and `parser_stress` suites with lexer/parser/HIR/THIR/diagnostics/codegen snapshots, including `complex_strings`, `deeply_nested`, and a `large_file` (1000 classes) verifying no errors. > - **Infrastructure**: > - **Incremental Parsing Utils**: New `baml_language/crates/baml_tests/src/utils/mod.rs` providing node reuse metrics, losslessness checks, edit helpers (insert/delete/replace), tree equivalence, common edit patterns, and performance measurement; wired via `pub mod utils` in `baml_tests/src/lib.rs`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 08bcd9d. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent a4d10d4 commit 18b3ceb

File tree

117 files changed

+47275
-44
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+47275
-44
lines changed

baml_language/crates/baml_syntax/src/ast.rs

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ ast_node!(TypeAliasDef, TYPE_ALIAS_DEF);
5757
ast_node!(ParameterList, PARAMETER_LIST);
5858
ast_node!(Parameter, PARAMETER);
5959
ast_node!(FunctionBody, FUNCTION_BODY);
60+
ast_node!(LlmFunctionBody, LLM_FUNCTION_BODY);
61+
ast_node!(ExprFunctionBody, EXPR_FUNCTION_BODY);
6062
ast_node!(Field, FIELD);
6163
ast_node!(EnumVariant, ENUM_VARIANT);
6264
ast_node!(ConfigBlock, CONFIG_BLOCK);
@@ -103,10 +105,30 @@ impl FunctionDef {
103105
self.syntax.children().find_map(TypeExpr::cast)
104106
}
105107

106-
/// Get the function body.
108+
/// Get the function body (generic, could be any type).
107109
pub fn body(&self) -> Option<FunctionBody> {
108110
self.syntax.children().find_map(FunctionBody::cast)
109111
}
112+
113+
/// Get the LLM function body if this is an LLM function.
114+
pub fn llm_body(&self) -> Option<LlmFunctionBody> {
115+
self.syntax.children().find_map(LlmFunctionBody::cast)
116+
}
117+
118+
/// Get the expression function body if this is an expression function.
119+
pub fn expr_body(&self) -> Option<ExprFunctionBody> {
120+
self.syntax.children().find_map(ExprFunctionBody::cast)
121+
}
122+
123+
/// Check if this is an LLM function.
124+
pub fn is_llm_function(&self) -> bool {
125+
self.llm_body().is_some()
126+
}
127+
128+
/// Check if this is an expression function.
129+
pub fn is_expr_function(&self) -> bool {
130+
self.expr_body().is_some()
131+
}
110132
}
111133

112134
impl ParameterList {

baml_language/crates/baml_syntax/src/syntax_kind.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,11 @@ pub enum SyntaxKind {
106106
PARAMETER_LIST,
107107
PARAMETER,
108108
FUNCTION_BODY,
109+
LLM_FUNCTION_BODY, // Function body with client/prompt
110+
EXPR_FUNCTION_BODY, // Function body with expressions/statements
109111
PROMPT_FIELD,
110112
CLIENT_REFERENCE,
113+
CLIENT_FIELD, // 'client' field in LLM function
111114
DEFAULT_IMPL,
112115

113116
// Class components
Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,10 @@
1-
# Treat insta snapshots as generated so GitHub collapses them
2-
snapshots/*.snap linguist-generated=true
1+
# Snapshot test files
2+
*.snap linguist-generated=true
3+
*.snap -diff
4+
*.snap merge=ours
5+
6+
# Alternative: treat as binary (more aggressive, hides content entirely)
7+
# *.snap binary
8+
9+
# Mark snapshots directory as generated
10+
snapshots/** linguist-generated=true
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
class User {
2+
name string @alias("user_name")
3+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
class User {
2+
name string
3+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Add Attribute
2+
3+
Tests incremental parsing when adding an attribute to a field.
4+
5+
Expected behavior:
6+
- Class node should be reused
7+
- Field node might be reparsed but efficiently
8+
- Overall node reuse should be >80%
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
class User {
2+
name "hello!" // Added one character
3+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
class User {
2+
name "hello"
3+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Add String Character
2+
3+
Tests incremental parsing when adding a single character to a string literal.
4+
5+
Expected behavior:
6+
- Parser should reuse the class and field nodes
7+
- Only the string literal node should be reparsed
8+
- Node reuse should be >95%
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
class User {
2+
name "unclosed" // Fixed the string
3+
}

0 commit comments

Comments
 (0)