|
| 1 | +# Delphi/Pascal Support for Context-Engine |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This PR adds full support for **Delphi/Pascal** source files and **DFM/FMX form files** to the Context-Engine. The implementation follows the existing `language_mappings` system and integrates seamlessly into the current architecture. |
| 6 | + |
| 7 | +### About Delphi/Pascal |
| 8 | + |
| 9 | +[Delphi](https://www.embarcadero.com/products/delphi) is a commercial IDE and compiler for Object Pascal, widely used for building native Windows desktop applications, cross-platform mobile apps (via FireMonkey), and server-side systems. It has a large legacy codebase footprint, particularly in enterprise and industrial environments. |
| 10 | + |
| 11 | +Key characteristics relevant to indexing: |
| 12 | +- **`.pas` units** contain Object Pascal source code with a distinctive `interface`/`implementation` section structure and `uses` clauses for dependency management. |
| 13 | +- **`.dfm`/`.fmx` form files** are Delphi-specific declarative files that describe UI layouts (component trees, properties, and event handler bindings). They are not code but have strong cross-references into `.pas` units. |
| 14 | +- **`.dpr`/`.dpk` project files** are Pascal source files that define the entry point for applications and packages respectively. |
| 15 | +- **`.lpr` project files** are the [Lazarus](https://www.lazarus-ide.org/) (Free Pascal) equivalent of `.dpr` files, used by the open-source Lazarus IDE. |
| 16 | + |
| 17 | +**Scope:** 11 files changed, ~1700 lines added, of which ~940 are tests. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## What Was Implemented? |
| 22 | + |
| 23 | +### 1. File Detection (`config.py`) |
| 24 | + |
| 25 | +New file extensions registered in `CODE_EXTS`: |
| 26 | + |
| 27 | +| Extension | Language | Description | |
| 28 | +|-----------|----------|-------------| |
| 29 | +| `.pas` | `pascal` | Delphi/Lazarus unit | |
| 30 | +| `.dpr` | `pascal` | Delphi project file | |
| 31 | +| `.dpk` | `pascal` | Delphi package file | |
| 32 | +| `.lpr` | `pascal` | Lazarus project file | |
| 33 | +| `.dfm` | `dfm` | VCL form file | |
| 34 | +| `.fmx` | `dfm` | FireMonkey form file | |
| 35 | + |
| 36 | +Additional exclusions for Delphi-specific artifacts: |
| 37 | +- Directories: `__history`, `__recovery` (Delphi IDE backups) |
| 38 | +- Files: `*.dcu`, `*.dcp`, `*.dcpil` (compiled binaries) |
| 39 | + |
| 40 | +### 2. PascalMapping (`language_mappings/pascal.py`) |
| 41 | + |
| 42 | +Full language mapping for Pascal/Delphi using a **dual approach**: |
| 43 | + |
| 44 | +- **Tree-sitter queries** (when `tree_sitter_pascal` is installed) for AST nodes: `declProc`, `declClass`, `declIntf`, `declEnum`, `declType`, `declConst`, `defProc` |
| 45 | +- **Regex fallback** (always available) as the primary implementation |
| 46 | + |
| 47 | +Extracted concepts: |
| 48 | + |
| 49 | +| Concept | Example | Metadata Kind | |
| 50 | +|---------|---------|---------------| |
| 51 | +| Classes | `TMyClass = class(TBase)` | `class` | |
| 52 | +| Records | `TPoint = record` | `record` | |
| 53 | +| Interfaces | `ILogger = interface` | `interface` | |
| 54 | +| Enumerations | `TStatus = (stNew, stActive)` | `enum` | |
| 55 | +| Procedures/Functions | `procedure Execute;` | `function` | |
| 56 | +| Methods | `procedure TMyClass.Execute;` | `method` | |
| 57 | +| Constants | `const MAX = 100;` | `constant` | |
| 58 | +| Type aliases | `TStringList = TList<string>;` | `type_alias` | |
| 59 | +| Uses clauses | `uses System.SysUtils;` | `import` | |
| 60 | + |
| 61 | +**Built-in filter:** RTL/VCL/FMX standard units (System, SysUtils, Classes, etc.) are recognized as built-ins to prevent false cross-references. |
| 62 | + |
| 63 | +### 3. DfmMapping (`language_mappings/dfm.py`) |
| 64 | + |
| 65 | +Standalone mapping for DFM/FMX form files — purely regex-based (no tree-sitter needed): |
| 66 | + |
| 67 | +- Detects component declarations (`object ButtonLogin: TButton`) |
| 68 | +- Detects nested components with hierarchy tracking |
| 69 | +- Extracts event handler bindings (`OnClick = ButtonLoginClick`) as cross-file references |
| 70 | +- Handles `inherited` forms correctly |
| 71 | +- Skips multiline properties and item collections |
| 72 | + |
| 73 | +### 4. Import Extraction (`metadata.py`) |
| 74 | + |
| 75 | +Pascal `uses` clauses are correctly extracted — both single-line and multi-line: |
| 76 | + |
| 77 | +```pascal |
| 78 | +uses |
| 79 | + System.SysUtils, |
| 80 | + System.Classes, |
| 81 | + UAuth; |
| 82 | +``` |
| 83 | + |
| 84 | +Keywords like `uses`, `in`, `interface`, `implementation` are filtered out. |
| 85 | + |
| 86 | +### 5. Symbol Extraction (`symbols.py`) |
| 87 | + |
| 88 | +`_extract_symbols_pascal()` extracts all symbol types with correct `kind`, `name`, and `path` (e.g., `TMyClass.Execute` for methods). |
| 89 | + |
| 90 | +### 6. Tree-sitter Integration (`tree_sitter.py`) |
| 91 | + |
| 92 | +`tree_sitter_pascal` has been added as an optional entry in the language loader. Since loading is wrapped in `try/except`, it gracefully falls back to the regex implementation when no Python package is installed. |
| 93 | + |
| 94 | +### 7. Language Registry (`language_mappings/__init__.py`) |
| 95 | + |
| 96 | +Three new entries: |
| 97 | +- `"pascal"` → `PascalMapping` |
| 98 | +- `"delphi"` → `PascalMapping` (alias) |
| 99 | +- `"dfm"` → `DfmMapping` |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## Design Decisions |
| 104 | + |
| 105 | +### Why a Dual Approach (Regex + Optional Tree-sitter)? |
| 106 | + |
| 107 | +There is no official `tree_sitter_pascal` Python package on PyPI compatible with the 0.25+ API. The regex fallback therefore serves as the primary implementation. Once a compatible package becomes available, the tree-sitter integration will activate automatically — without any code changes. |
| 108 | + |
| 109 | +### Why Separate Mappings for `.pas` and `.dfm`? |
| 110 | + |
| 111 | +DFM/FMX files have a completely different format from Pascal code (property declarations rather than a programming language). A separate `DfmMapping` with its own `"dfm"` language key is cleaner than mixing everything into `PascalMapping`. |
| 112 | + |
| 113 | +### Reference: Codegraph |
| 114 | + |
| 115 | +The Delphi support in Codegraph (TypeScript/web-tree-sitter) served as the reference implementation for AST node types, built-in filters, and DFM parsing. |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Tests |
| 120 | + |
| 121 | +**~120 new tests** across four test files: |
| 122 | + |
| 123 | +| Test File | Tests | Coverage | |
| 124 | +|-----------|-------|----------| |
| 125 | +| `test_pascal_language_mapping.py` | 47 | PascalMapping: instantiation, queries, import extraction, symbol extraction, metadata, realistic fixtures | |
| 126 | +| `test_dfm_language_mapping.py` | ~30 | DfmMapping: instantiation, components, events, hierarchy, multiline properties, collections | |
| 127 | +| `test_language_coverage.py` | 7 | Integration: Pascal uses-imports, Delphi alias, symbol coverage | |
| 128 | +| `test_ast_analyzer_mappings.py` | 1 | Mapping count updated from 32 → 35 | |
| 129 | + |
| 130 | +All tests pass (`1197 passed`). |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## Files Changed |
| 135 | + |
| 136 | +| File | Type | Description | |
| 137 | +|------|------|-------------| |
| 138 | +| `scripts/ingest/config.py` | Modified | +6 extensions, +5 exclusions | |
| 139 | +| `scripts/ingest/language_mappings/pascal.py` | **New** | PascalMapping (~400 lines) | |
| 140 | +| `scripts/ingest/language_mappings/dfm.py` | **New** | DfmMapping (~220 lines) | |
| 141 | +| `scripts/ingest/language_mappings/__init__.py` | Modified | +3 registry entries | |
| 142 | +| `scripts/ingest/metadata.py` | Modified | +Pascal uses-clause extraction | |
| 143 | +| `scripts/ingest/symbols.py` | Modified | +`_extract_symbols_pascal()` | |
| 144 | +| `scripts/ingest/tree_sitter.py` | Modified | +optional `tree_sitter_pascal` entry | |
| 145 | +| `tests/test_pascal_language_mapping.py` | **New** | 47 unit tests | |
| 146 | +| `tests/test_dfm_language_mapping.py` | **New** | DFM tests | |
| 147 | +| `tests/test_language_coverage.py` | Modified | +Pascal integration tests | |
| 148 | +| `tests/test_ast_analyzer_mappings.py` | Modified | Mapping count updated | |
0 commit comments