Skip to content

Commit a32a0fa

Browse files
authored
Merge pull request #228 from omonien/delphi-support
feat: Add Delphi/Pascal support (PascalMapping, DfmMapping & full test suite)
2 parents 45a3330 + 1646709 commit a32a0fa

File tree

12 files changed

+1854
-2
lines changed

12 files changed

+1854
-2
lines changed

docs/DELPHI_SUPPORT.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Delphi/Pascal Support for Context-Engine
2+
3+
## Summary
4+
5+
This PR adds full support for **Delphi/Pascal** source files and **DFM/FMX form files** to the Context-Engine. The implementation follows the existing `language_mappings` system and integrates seamlessly into the current architecture.
6+
7+
### About Delphi/Pascal
8+
9+
[Delphi](https://www.embarcadero.com/products/delphi) is a commercial IDE and compiler for Object Pascal, widely used for building native Windows desktop applications, cross-platform mobile apps (via FireMonkey), and server-side systems. It has a large legacy codebase footprint, particularly in enterprise and industrial environments.
10+
11+
Key characteristics relevant to indexing:
12+
- **`.pas` units** contain Object Pascal source code with a distinctive `interface`/`implementation` section structure and `uses` clauses for dependency management.
13+
- **`.dfm`/`.fmx` form files** are Delphi-specific declarative files that describe UI layouts (component trees, properties, and event handler bindings). They are not code but have strong cross-references into `.pas` units.
14+
- **`.dpr`/`.dpk` project files** are Pascal source files that define the entry point for applications and packages respectively.
15+
- **`.lpr` project files** are the [Lazarus](https://www.lazarus-ide.org/) (Free Pascal) equivalent of `.dpr` files, used by the open-source Lazarus IDE.
16+
17+
**Scope:** 11 files changed, ~1700 lines added, of which ~940 are tests.
18+
19+
---
20+
21+
## What Was Implemented?
22+
23+
### 1. File Detection (`config.py`)
24+
25+
New file extensions registered in `CODE_EXTS`:
26+
27+
| Extension | Language | Description |
28+
|-----------|----------|-------------|
29+
| `.pas` | `pascal` | Delphi/Lazarus unit |
30+
| `.dpr` | `pascal` | Delphi project file |
31+
| `.dpk` | `pascal` | Delphi package file |
32+
| `.lpr` | `pascal` | Lazarus project file |
33+
| `.dfm` | `dfm` | VCL form file |
34+
| `.fmx` | `dfm` | FireMonkey form file |
35+
36+
Additional exclusions for Delphi-specific artifacts:
37+
- Directories: `__history`, `__recovery` (Delphi IDE backups)
38+
- Files: `*.dcu`, `*.dcp`, `*.dcpil` (compiled binaries)
39+
40+
### 2. PascalMapping (`language_mappings/pascal.py`)
41+
42+
Full language mapping for Pascal/Delphi using a **dual approach**:
43+
44+
- **Tree-sitter queries** (when `tree_sitter_pascal` is installed) for AST nodes: `declProc`, `declClass`, `declIntf`, `declEnum`, `declType`, `declConst`, `defProc`
45+
- **Regex fallback** (always available) as the primary implementation
46+
47+
Extracted concepts:
48+
49+
| Concept | Example | Metadata Kind |
50+
|---------|---------|---------------|
51+
| Classes | `TMyClass = class(TBase)` | `class` |
52+
| Records | `TPoint = record` | `record` |
53+
| Interfaces | `ILogger = interface` | `interface` |
54+
| Enumerations | `TStatus = (stNew, stActive)` | `enum` |
55+
| Procedures/Functions | `procedure Execute;` | `function` |
56+
| Methods | `procedure TMyClass.Execute;` | `method` |
57+
| Constants | `const MAX = 100;` | `constant` |
58+
| Type aliases | `TStringList = TList<string>;` | `type_alias` |
59+
| Uses clauses | `uses System.SysUtils;` | `import` |
60+
61+
**Built-in filter:** RTL/VCL/FMX standard units (System, SysUtils, Classes, etc.) are recognized as built-ins to prevent false cross-references.
62+
63+
### 3. DfmMapping (`language_mappings/dfm.py`)
64+
65+
Standalone mapping for DFM/FMX form files — purely regex-based (no tree-sitter needed):
66+
67+
- Detects component declarations (`object ButtonLogin: TButton`)
68+
- Detects nested components with hierarchy tracking
69+
- Extracts event handler bindings (`OnClick = ButtonLoginClick`) as cross-file references
70+
- Handles `inherited` forms correctly
71+
- Skips multiline properties and item collections
72+
73+
### 4. Import Extraction (`metadata.py`)
74+
75+
Pascal `uses` clauses are correctly extracted — both single-line and multi-line:
76+
77+
```pascal
78+
uses
79+
System.SysUtils,
80+
System.Classes,
81+
UAuth;
82+
```
83+
84+
Keywords like `uses`, `in`, `interface`, `implementation` are filtered out.
85+
86+
### 5. Symbol Extraction (`symbols.py`)
87+
88+
`_extract_symbols_pascal()` extracts all symbol types with correct `kind`, `name`, and `path` (e.g., `TMyClass.Execute` for methods).
89+
90+
### 6. Tree-sitter Integration (`tree_sitter.py`)
91+
92+
`tree_sitter_pascal` has been added as an optional entry in the language loader. Since loading is wrapped in `try/except`, it gracefully falls back to the regex implementation when no Python package is installed.
93+
94+
### 7. Language Registry (`language_mappings/__init__.py`)
95+
96+
Three new entries:
97+
- `"pascal"``PascalMapping`
98+
- `"delphi"``PascalMapping` (alias)
99+
- `"dfm"``DfmMapping`
100+
101+
---
102+
103+
## Design Decisions
104+
105+
### Why a Dual Approach (Regex + Optional Tree-sitter)?
106+
107+
There is no official `tree_sitter_pascal` Python package on PyPI compatible with the 0.25+ API. The regex fallback therefore serves as the primary implementation. Once a compatible package becomes available, the tree-sitter integration will activate automatically — without any code changes.
108+
109+
### Why Separate Mappings for `.pas` and `.dfm`?
110+
111+
DFM/FMX files have a completely different format from Pascal code (property declarations rather than a programming language). A separate `DfmMapping` with its own `"dfm"` language key is cleaner than mixing everything into `PascalMapping`.
112+
113+
### Reference: Codegraph
114+
115+
The Delphi support in Codegraph (TypeScript/web-tree-sitter) served as the reference implementation for AST node types, built-in filters, and DFM parsing.
116+
117+
---
118+
119+
## Tests
120+
121+
**~120 new tests** across four test files:
122+
123+
| Test File | Tests | Coverage |
124+
|-----------|-------|----------|
125+
| `test_pascal_language_mapping.py` | 47 | PascalMapping: instantiation, queries, import extraction, symbol extraction, metadata, realistic fixtures |
126+
| `test_dfm_language_mapping.py` | ~30 | DfmMapping: instantiation, components, events, hierarchy, multiline properties, collections |
127+
| `test_language_coverage.py` | 7 | Integration: Pascal uses-imports, Delphi alias, symbol coverage |
128+
| `test_ast_analyzer_mappings.py` | 1 | Mapping count updated from 32 → 35 |
129+
130+
All tests pass (`1197 passed`).
131+
132+
---
133+
134+
## Files Changed
135+
136+
| File | Type | Description |
137+
|------|------|-------------|
138+
| `scripts/ingest/config.py` | Modified | +6 extensions, +5 exclusions |
139+
| `scripts/ingest/language_mappings/pascal.py` | **New** | PascalMapping (~400 lines) |
140+
| `scripts/ingest/language_mappings/dfm.py` | **New** | DfmMapping (~220 lines) |
141+
| `scripts/ingest/language_mappings/__init__.py` | Modified | +3 registry entries |
142+
| `scripts/ingest/metadata.py` | Modified | +Pascal uses-clause extraction |
143+
| `scripts/ingest/symbols.py` | Modified | +`_extract_symbols_pascal()` |
144+
| `scripts/ingest/tree_sitter.py` | Modified | +optional `tree_sitter_pascal` entry |
145+
| `tests/test_pascal_language_mapping.py` | **New** | 47 unit tests |
146+
| `tests/test_dfm_language_mapping.py` | **New** | DFM tests |
147+
| `tests/test_language_coverage.py` | Modified | +Pascal integration tests |
148+
| `tests/test_ast_analyzer_mappings.py` | Modified | Mapping count updated |

scripts/ingest/config.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,13 @@ def _env_truthy(val: str | None, default: bool) -> bool:
177177
".vhdl": "vhdl",
178178
".asm": "assembly",
179179
".s": "assembly",
180+
# Delphi/Pascal
181+
".pas": "pascal", # Delphi/Lazarus Unit
182+
".dpr": "pascal", # Delphi Project
183+
".dpk": "pascal", # Delphi Package
184+
".lpr": "pascal", # Lazarus Project
185+
".dfm": "dfm", # VCL Form (eigenes Mapping, da eigenes Format)
186+
".fmx": "dfm", # FireMonkey Form
180187
}
181188

182189
# Files matched by name (no extension or special names)
@@ -220,6 +227,8 @@ def _env_truthy(val: str | None, default: bool) -> bool:
220227
"obj",
221228
"TestResults",
222229
"/.git",
230+
"/__history", # Delphi IDE Backup
231+
"/__recovery", # Delphi IDE Recovery
223232
]
224233

225234
# Glob patterns for directories (matched against basename)
@@ -236,6 +245,9 @@ def _env_truthy(val: str | None, default: bool) -> bool:
236245
"tokenizer.json",
237246
"*.whl",
238247
"*.tar.gz",
248+
"*.dcu", # Delphi Compiled Unit
249+
"*.dcp", # Delphi Compiled Package
250+
"*.dcpil", # IL-Datei
239251
]
240252

241253
_ANY_DEPTH_EXCLUDE_DIR_NAMES = {

scripts/ingest/language_mappings/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@
2828
from .matlab import MatlabMapping
2929
from .objc import ObjCMapping
3030

31+
from .pascal import PascalMapping
32+
from .dfm import DfmMapping
3133
from .php import PHPMapping
3234
from .python import PythonMapping
3335
from .rust import RustMapping
@@ -64,6 +66,9 @@
6466
"matlab": MatlabMapping,
6567
"objc": ObjCMapping,
6668

69+
"pascal": PascalMapping,
70+
"delphi": PascalMapping, # Alias
71+
"dfm": DfmMapping,
6772
"php": PHPMapping,
6873
"python": PythonMapping,
6974
"rust": RustMapping,
@@ -122,6 +127,8 @@ def supported_languages() -> List[str]:
122127
"MatlabMapping",
123128
"ObjCMapping",
124129

130+
"PascalMapping",
131+
"DfmMapping",
125132
"PHPMapping",
126133
"PythonMapping",
127134
"RustMapping",

0 commit comments

Comments
 (0)