|
| 1 | +# Copilot / AI instructions for SqlScriptDOM |
| 2 | + |
1 | 3 | ScriptDom is a library for parsing and generating T-SQL scripts. It is primarily used by DacFx to build database projects, perform schema comparisons, and generate scripts for deployment. |
2 | 4 |
|
3 | | -T-SQL syntax definitions are defined in the .g files in SqlScriptDom/Parser/TSql/. The file names map to SQL Server versions, e.g. TSql170.g corresponds to the syntax definitions for SQL Server 2025, TSql160.g to SQL Server 2022, etc. Syntax for Azure SQL Database should always be based on the latest SQL Server version. |
| 5 | +## Key points (quick read) |
| 6 | +- Grammar files live in: `SqlScriptDom/Parser/TSql/` — each file corresponds to a SQL Server version (e.g. `TSql170.g` for 170 / SQL Server 2025). |
| 7 | +- Grammar format: ANTLR v2. Generated C# lexer/parser code is produced during the build (see `GenerateFiles.props`). |
| 8 | +- Build & tests: use the .NET SDK pinned in `global.json`. Typical commands from repo root: |
| 9 | + - `dotnet build -c Debug` |
| 10 | + - `dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug` |
| 11 | +- To regenerate parser/token/AST sources explicitly, build the main project (generation targets are hooked into its build): |
| 12 | + - `dotnet build SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -c Debug` |
| 13 | + - (or) `dotnet msbuild SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -t:GLexerParserCompile;GSqlTokenTypesCompile;CreateAST -p:Configuration=Debug` |
| 14 | + |
| 15 | +## Why files are generated and where |
| 16 | +- `SqlScriptDom/GenerateFiles.props` contains the MSBuild targets invoked during the library build: |
| 17 | + - `GSqlTokenTypesCompile` / `GLexerParserCompile` -> run ANTLR and post-process outputs (powershell/sed scripts) |
| 18 | + - `CreateAST` -> runs AstGen tool (from `tools/AstGen`) to generate AST visitor/fragment classes |
| 19 | + - `GenerateEverything` -> runs ScriptGenSettingsGenerator and TokenListGenerator |
| 20 | +- The Antlr binary is downloaded to the path defined in `Directory.Build.props` (`AntlrLocation`) when the build runs (via the `InstallAntlr` target). |
| 21 | +- Generated C# files are written to `$(CsGenIntermediateOutputPath)` (under `obj/...` by default). Do not hand-edit generated files — change the .g grammar or post-processing scripts instead. |
| 22 | + |
| 23 | +## Important files and folders (read these first) |
| 24 | +- `SqlScriptDom/Parser/TSql/*.g` — ANTLR v2 grammar files (TSql80..TSql170 etc.). Example: `TSql170.g` defines new-170 syntax. |
| 25 | +- `SqlScriptDom/GenerateFiles.props` and `Directory.Build.props` — define code generation targets and antlr location. |
| 26 | +- `SqlScriptDom/ParserPostProcessing.sed`, `LexerPostProcessing.sed`, `TSqlTokenTypes.ps1` — post-processing for generated C# sources and tokens. |
| 27 | +- `tools/` — contains code generators used during build: `AstGen`, `ScriptGenSettingsGenerator`, `TokenListGenerator`. |
| 28 | +- `Test/SqlDom/` — unit tests, baselines and test scripts. See `Only170SyntaxTests.cs`, `TestScripts/`, and `Baselines170/`. |
| 29 | + |
| 30 | +## Developer workflow & conventions (typical change cycle) |
| 31 | +1. Add/modify grammar rule(s) in the correct `TSql*.g` (pick the _version_ the syntax belongs to). |
| 32 | +2. If tokens or token ordering change, update `TSqlTokenTypes.g` (and the sed/ps1 post-processors if necessary). |
| 33 | +3. Rebuild the ScriptDom project to regenerate parser and AST (`dotnet build` will run generation). Use the targeted msbuild targets if you only want generation. |
| 34 | +4. Add tests: |
| 35 | + - Put the input SQL in `Test/SqlDom/TestScripts/` (filename is case sensitive and used as an embedded resource). |
| 36 | + - Add/confirm baseline output in `Test/SqlDom/Baselines<version>/` (the UT project embeds these baselines as resources). |
| 37 | + - Update the appropriate `Only<version>SyntaxTests.cs` (e.g., `Only170SyntaxTests.cs`) by adding a `ParserTest170("MyNewTest.sql", ...)` entry. See `ParserTest.cs` and `ParserTestOutput.cs` for helper constructors and verification semantics. |
| 38 | +5. Run `dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug` and iterate until tests pass. |
| 39 | + |
| 40 | +## Testing details and how tests assert correctness |
| 41 | +- Tests run a full parse -> script generator -> reparse round-trip. Baseline comparison verifies pretty-printed generated scripts exactly match the stored baseline. |
| 42 | +- Expected parse errors (where applicable) are verified by number and exact error messages; test helpers live in `ParserTest.cs`, `ParserTestOutput.cs`, and `ParserTestUtils.cs`. |
| 43 | +- If a test fails due to mismatch in generated script, compare the generated output (the test harness logs it) against the baseline to spot formatting/structure differences. |
| 44 | + |
| 45 | +## Bug Fixing and Baseline Generation |
| 46 | +For a practical guide on fixing bugs, including the detailed workflow for generating test baselines, see the [Bug Fixing Guide](BUG_FIXING_GUIDE.md). |
| 47 | + |
| 48 | +## Editing generated outputs, debugging generation |
| 49 | +- Never edit generated files permanently (they live under `obj/...`/CsGenIntermediateOutputPath). Instead change: |
| 50 | + - `.g` grammar files |
| 51 | + - post-processing scripts (`*.ps1`/`*.sed`) |
| 52 | + - AST XML in `SqlScriptDom/Parser/TSql/Ast.xml` if AST node shapes need to change (used by `tools/AstGen`). |
| 53 | +- To see antlr output/errors, force verbose generation by setting MSBuild property `OutputErrorInLexerParserCompile=true` on the command line (e.g. `dotnet msbuild -t:GLexerParserCompile -p:OutputErrorInLexerParserCompile=true`). |
| 54 | +- If the antlr download fails during build, manually download `antlr-2.7.5.jar` (for non-Windows) or `.exe` (for Windows) and place it at the location defined in `Directory.Build.props` or override `AntlrLocation` when invoking msbuild. |
| 55 | + |
4 | 56 |
|
5 | | -The grammar files are in ANTLR v2 format. C# code is generated from these grammar files as part of the build process. |
| 57 | +## Patterns & code style to follow (examples you will see) |
| 58 | +- Grammar rule pattern: `ruleName returns [Type vResult = this.FragmentFactory.CreateFragment<Type>()] { ... } : ( alternatives ) ;` — this pattern initializes an AST fragment via FragmentFactory. |
| 59 | +- Parser-generated code frequently uses `Match(<token>, CodeGenerationSupporter.<Symbol>)` and `ThrowParseErrorException("SQLxxxx", ...)` for diagnostics. |
| 60 | +- The codebase prefers using the factory and fragment visitors for AST creation and script generation. Look at `ScriptDom/SqlServer/ScriptGenerator` for script generation patterns. |
6 | 61 |
|
7 | | -For each new syntax definition, ScriptDom needs to be able to parse it successfully, and roundtrip back to the original script via the script generator. |
| 62 | +## Grammar Gotchas & Common Pitfalls |
| 63 | +- **Operator vs. Function-Style Predicates:** Be careful to distinguish between standard T-SQL operators (like `NOT LIKE`, `>`, `=`) and the function-style predicates used in some contexts (like `package.equals(...)` in `CREATE EVENT SESSION`). For example, `NOT LIKE` in an event session's `WHERE` clause is a standard comparison operator, not a function call. Always verify the exact T-SQL syntax before modifying the grammar. |
| 64 | +- **Logical `NOT` vs. Compound Operators:** The grammar handles the logical `NOT` operator (e.g., `WHERE NOT (condition)`) in a general way, often in a `booleanExpressionUnary` rule. This is distinct from compound operators like `NOT LIKE` or `NOT IN`, which are typically parsed as a single unit within a comparison rule. Don't assume that because `NOT` is supported, `NOT LIKE` will be automatically supported in all predicate contexts. |
8 | 65 |
|
9 | | -Changes need to have accompanying tests in Only170SyntaxTests.cs or the one for its respective version. The test framework should already verify the parser and script generator; you just need to add the test scripts to TestScripts and corresponding Baselines folder. Older syntaxes should be supported unless explicitly stated otherwise. |
|
0 commit comments