Skip to content

Commit d13ea7f

Browse files
lint
1 parent 6b1b4d2 commit d13ea7f

File tree

5 files changed

+277
-6
lines changed

5 files changed

+277
-6
lines changed

.rustfmt.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +0,0 @@
1-
imports_granularity = "Item"
2-
unstable_features = true

NEWPLAN.md

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Django Template Language Processing: Phased Analysis Approach
2+
3+
## Current State Assessment
4+
5+
Currently, the Django template parser in `crates/djls-template-ast` uses a one-pass parsing approach:
6+
7+
1. **Lexical Analysis:** The `Lexer` tokenizes the input template into a `TokenStream`.
8+
2. **Combined Syntactic & Semantic Analysis:** The `Parser` processes the tokens and directly generates a rich AST (`Ast`) with `Node` objects. This single pass simultaneously:
9+
* Determines syntactic structure (recognizing tags, variables, text).
10+
* Applies semantic meaning based on `TagSpecs` (understanding container/branch/closing relationships, tag arguments).
11+
* Builds the final nested tree structure.
12+
3. **Validation:** After parsing, a separate `Validator` runs on the rich AST to check for *additional* errors (some semantic checks might have already happened during the parse).
13+
14+
The current implementation mixes syntactic structure recognition with semantic interpretation within a single parsing step.
15+
16+
## Goal
17+
18+
Refactor the language processing pipeline to clearly separate syntactic analysis from semantic analysis, aligning with standard compiler/interpreter design principles.
19+
20+
1. **Phase 1: Syntactic Analysis (Parsing)**
21+
* Process the `TokenStream` to validate the grammatical structure of the template.
22+
* Produce a representation of the raw syntax, like a **Syntax Tree** (potentially a flat `NodeList` of simple nodes), *without* interpreting tag meanings or relationships.
23+
* Focus solely on whether the token sequence forms a valid Django template structure according to the language grammar.
24+
25+
2. **Phase 2: Semantic Analysis (Single-File)**
26+
* Process the Syntax Tree generated by the Parser.
27+
* Apply `TagSpecs` to understand the *meaning* and behavior of tags.
28+
* Resolve tag relationships (container, branch, closing) within the single file.
29+
* Analyze and validate tag arguments and variable filters.
30+
* Build the final, rich, semantically-aware AST with proper nesting and semantic information.
31+
32+
3. **Phase 3: Semantic Analysis (Cross-File / Project)**
33+
* Analyze multiple single-file ASTs in the context of a project.
34+
* Handle template inheritance (`extends`, `include`, `block`) by building an inheritance graph.
35+
* Perform cross-file validations and provide cross-file code intelligence features (handled primarily by the LSP Server).
36+
37+
## Architecture: Phased Analysis Pipeline
38+
39+
We will structure the processing pipeline based on distinct analysis phases:
40+
41+
1. **Syntax Layer (Parser)**:
42+
* **Lexical Analysis:** `Source -> TokenStream` (Existing `Lexer`).
43+
* **Syntactic Analysis:** `TokenStream -> Syntax Tree` (e.g., `NodeList` of `SimpleNode`). Focuses on grammar and structure only. Generates syntax errors.
44+
45+
2. **Single-File Semantics Layer (Semantic Analyzer)**:
46+
* **Semantic Analysis:** `Syntax Tree -> Rich AST` (Existing `Ast` structure). Applies `TagSpecs`, resolves tag relationships, validates arguments/filters within one file. Builds the nested AST. Generates single-file semantic errors.
47+
48+
3. **Cross-File Semantics Layer (LSP Server / Project Analyzer)**:
49+
* **Project-Level Analysis:** `Multiple ASTs -> Inheritance Graph / Cross-File Insights`. Handles `extends`, `include`, `block` resolution across files. Performs cross-file validation. Generates cross-file semantic errors.
50+
51+
This phased approach separates concerns effectively, mirroring how compilers and interpreters process code.
52+
53+
## Rationale for Phased Analysis
54+
55+
### Benefits for Django Template Processing
56+
57+
1. **Alignment with Language Features**:
58+
* **Container Tags**: Correctly matching `if`/`endif`, `for`/`endfor` pairs and their branches (`elif`, `else`, `empty`) is naturally handled during semantic analysis after the basic structure is known.
59+
* **Custom Tag Libraries**: `{% load %}` directives can be identified syntactically by the Parser, with the actual tag definitions applied during Semantic Analysis.
60+
* **Filter Chains & Arguments**: Syntax can be parsed first, with validation and semantic processing (argument checking, filter resolution) deferred to Semantic Analysis.
61+
62+
2. **LSP-Specific Advantages**:
63+
* **Faster Syntactic Feedback**: The Parser (Syntax Layer) can run quickly, providing immediate feedback on basic syntax errors as the user types.
64+
* **Clearer Error Categorization**: Errors are naturally categorized by the phase that detects them (Syntax Errors vs. Semantic Errors vs. Cross-File Errors).
65+
* **Improved Handling of Incomplete Code**: The Parser can often produce a partial Syntax Tree even if semantic errors exist, allowing basic features (highlighting) to function.
66+
* **Enhanced Code Intelligence**: The dedicated Semantic Analysis phase builds a richer AST, enabling more accurate completions, hover information, and navigation within a file.
67+
68+
3. **Technical Benefits**:
69+
* **Separation of Concerns**: Clear distinction between validating grammar (Parser) and interpreting meaning (Semantic Analyzer).
70+
* **Simplified Error Recovery**: The Parser can focus on recovering from syntax errors without the complexity of semantic context.
71+
* **More Maintainable Code**: Isolating semantic logic (tag spec application, relationship resolution) makes the system easier to understand, modify, and extend.
72+
* **Potentially Better Incremental Performance**: Changes might only require re-running the Parser on a small section and then re-running Semantic Analysis only on the affected parts of the Syntax Tree.
73+
74+
### Performance Considerations for LSP Context
75+
76+
1. **Incremental Processing Optimization**:
77+
* Small text changes might only require re-lexing and re-parsing a small portion of the source, updating the Syntax Tree locally.
78+
* Semantic Analysis can then be re-run, potentially only on the changed sub-tree and its ancestors/dependents.
79+
* Enables more granular invalidation and reprocessing.
80+
81+
2. **Lazy Evaluation**:
82+
* Semantic Analysis could potentially be executed lazily, only when features requiring the rich AST are invoked. Basic syntax checks use only the Parser's output.
83+
84+
3. **Caching Opportunities**:
85+
* The Syntax Tree (`NodeList`) output by the Parser is a potential caching point.
86+
* Results from Semantic Analysis (the rich AST) can also be cached.
87+
88+
4. **Asynchronous Processing**:
89+
* The fast Parser phase could run synchronously for immediate feedback, while the potentially slower Semantic Analysis phase(s) could run asynchronously.
90+
91+
## Detailed Design
92+
93+
### Syntax Tree Structure (`NodeList`)
94+
95+
The output of the **Parser (Syntax Layer)** will be a `NodeList`, representing the basic syntactic structure. It's a flat, sequential list corresponding closely to the significant tokens:
96+
97+
```rust
98+
// Output of the Parser (Syntax Layer)
99+
pub struct NodeList {
100+
nodes: Vec<SimpleNode>,
101+
line_offsets: LineOffsets, // Derived during lexing/parsing
102+
}
103+
104+
// Represents a node recognized purely based on syntax
105+
pub enum SimpleNode {
106+
Tag {
107+
name: String, // Syntactically identified tag name
108+
content: String, // Raw content inside {% ... %}
109+
span: Span,
110+
},
111+
Variable {
112+
content: String, // Raw content inside {{ ... }}
113+
span: Span,
114+
},
115+
Text {
116+
content: String,
117+
span: Span,
118+
},
119+
Comment { // Only {# ... #} comments recognized here
120+
content: String,
121+
span: Span,
122+
},
123+
// Maybe other types like HtmlTag if needed for basic structure
124+
}
125+
```
126+
127+
Key characteristics:
128+
1. Represents output of *syntactic analysis* only.
129+
2. `SimpleNode` variants contain raw content, minimal processing.
130+
3. Flat list structure (or a very basic tree if preferred).
131+
4. No semantic understanding (tag types, relationships, filters).
132+
133+
### Semantic Analysis Process (Single-File)
134+
135+
The **Semantic Analyzer (Single-File Semantics Layer)** takes the `NodeList` (Syntax Tree) as input and produces the rich `Ast`:
136+
137+
1. Iterate through the `NodeList`.
138+
2. For `SimpleNode::Tag` nodes:
139+
* Look up the `tag.name` in the `TagSpecs`.
140+
* Based on the `TagSpec` (Container, Single, Inclusion):
141+
* **Container:** Find matching closing tags (`SimpleNode::Tag` with expected name) later in the `NodeList`. Identify intermediate branch tags. Recursively process nodes between the opening/closing/branch tags to build nested `Node::Block(Block::Container)` or `Node::Block(Block::Branch)`.
142+
* **Single:** Create a `Node::Block(Block::Single)`.
143+
* **Inclusion:** Create a `Node::Block(Block::Inclusion)`. Identify template name argument syntactically.
144+
* Parse and validate tag arguments (`tag.content`) according to `ArgSpec` in the `TagSpec`.
145+
3. For `SimpleNode::Variable` nodes:
146+
* Parse the `variable.content` into variable bits and `DjangoFilter`s.
147+
* Validate filter syntax. (Actual filter existence/argument validation might involve `TagSpecs` or a separate filter registry).
148+
* Create a `Node::Variable`.
149+
4. For `SimpleNode::Text` and `SimpleNode::Comment`: Create corresponding `Node::Text` / `Node::Comment`.
150+
5. Assemble these rich `Node` objects into the final nested `Ast` structure.
151+
6. Collect semantic errors encountered during this process (mismatched tags, invalid arguments, unknown filters, etc.).
152+
153+
### Template Inheritance Handling (Cross-File Semantics)
154+
155+
Template inheritance (`extends`, `blocks`, `includes`) is handled by the **LSP Server / Project Analyzer (Cross-File Semantics Layer)**:
156+
157+
1. The Semantic Analyzer (Single-File) identifies inheritance-related tags (`{% extends %}`, `{% block %}`, `{% include %}`) and represents them in the rich `Ast` like other tags (e.g., as `Block::Single` or `Block::Container`).
158+
2. The LSP Server Layer:
159+
* Collects `Ast`s from all relevant project files.
160+
* Builds an inheritance graph based on `extends` and `include` relationships found in the ASTs.
161+
* Resolves `block` overrides across the inheritance chain.
162+
* Provides cross-file validation (circular extends, missing templates/blocks).
163+
* Powers LSP features requiring cross-file knowledge.
164+
165+
This separation aligns with Django's own rendering process where inheritance is resolved after individual templates are parsed.
166+
167+
## Implementation Plan
168+
169+
### Phase 1: Define Syntax Tree Structure
170+
- [ ] Define `SimpleNode` enum (Tag, Variable, Text, Comment).
171+
- [ ] Define `NodeList` struct holding `Vec<SimpleNode>` and `LineOffsets`.
172+
- [ ] Implement basic methods for `NodeList`.
173+
- [ ] Ensure `Span` information is accurately captured for `SimpleNode`s.
174+
- [ ] Add tests for the `NodeList` and `SimpleNode` structures.
175+
176+
### Phase 2: Implement Syntactic Parser
177+
- [ ] Refactor `parser.rs` into a `SyntacticParser` (or similar name).
178+
- [ ] Implement logic to consume `TokenStream` and produce a `NodeList`.
179+
- [ ] Focus solely on recognizing syntactic structures (`{% .. %}`, `{{ .. }}`, etc.) and mapping them to `SimpleNode`s *without* using `TagSpecs`.
180+
- [ ] Handle basic syntax error detection (e.g., unclosed `{%`).
181+
- [ ] Implement error recovery to continue parsing and produce a partial `NodeList`.
182+
- [ ] Preserve accurate `Span` information from tokens to `SimpleNode`s.
183+
- [ ] Create unit tests specifically for the Syntactic Parser.
184+
185+
### Phase 3: Implement Single-File Semantic Analyzer
186+
- [ ] Create a new `SemanticAnalyzer` struct/module.
187+
- [ ] Implement the logic to process an input `NodeList` (Syntax Tree).
188+
- [ ] Integrate `TagSpecs` lookup.
189+
- [ ] Implement algorithms for matching container/branch/closing tags based on `TagSpecs`.
190+
- [ ] Implement logic to parse/validate tag arguments based on `ArgSpec`.
191+
- [ ] Implement logic to parse/validate variable filters.
192+
- [ ] Build the final rich `Ast` tree structure with correct nesting.
193+
- [ ] Collect and report semantic errors found during analysis (e.g., missing `endif`, bad arguments).
194+
- [ ] Create unit tests for the Semantic Analyzer.
195+
196+
### Phase 4: LSP-Specific Optimizations
197+
- [ ] Implement incremental parsing support for the Syntactic Parser (Phase 2).
198+
- [ ] Add caching for the `NodeList` (Syntax Tree).
199+
- [ ] Implement selective invalidation/reprocessing for the Semantic Analyzer (Phase 3) based on changes in the `NodeList`.
200+
- [ ] Explore lazy execution of Semantic Analysis for syntax-only operations.
201+
- [ ] Add performance benchmarks focused on editing scenarios and LSP request timings.
202+
203+
### Phase 5: Implement Cross-File Semantic Analysis Framework (LSP Layer)
204+
- [ ] Design components within the LSP server for managing multiple ASTs.
205+
- [ ] Create data structures for the template inheritance graph.
206+
- [ ] Implement logic to build the graph by analyzing `extends`/`include` tags in ASTs.
207+
- [ ] Implement `block` resolution logic across the inheritance chain.
208+
- [ ] Create interfaces for LSP features (diagnostics, navigation) to query this cross-file information.
209+
- [ ] Implement tests for template inheritance resolution.
210+
211+
### Phase 6: Error Handling Strategy
212+
- [ ] Define distinct error types for Syntax Errors (from Parser), Single-File Semantic Errors (from Analyzer), and Cross-File Errors (from LSP Layer).
213+
- [ ] Implement robust error collection mechanisms for each phase.
214+
- [ ] Ensure all errors retain accurate `Span` information traceable to the original source.
215+
- [ ] Provide clear error messages indicating the nature (syntax vs. semantic) and source of the error.
216+
- [ ] Implement error recovery in both the Parser and Semantic Analyzer.
217+
- [ ] Add tests specifically for error handling and recovery across phases.
218+
219+
### Phase 7: Update Validation and Public API
220+
- [ ] Review/Update the existing `Validator` (`validator.rs`). Much of its logic will move into the Semantic Analyzer (Phase 3). Determine if a separate final validation step is still needed on the rich AST.
221+
- [ ] Update the public API in `lib.rs` (e.g., `parse_template`) to reflect the new internal pipeline. Options:
222+
* Expose only the final rich `Ast` and combined errors.
223+
* Optionally expose the intermediate `NodeList` (Syntax Tree) for tools needing only syntax info.
224+
- [ ] Maintain backward compatibility during transition if possible, or clearly document API changes.
225+
- [ ] Update documentation for the new architecture and API.
226+
227+
### Phase 8: Testing and Performance Optimization
228+
- [ ] Create comprehensive integration tests covering the entire pipeline (Lexer -> Parser -> Semantic Analyzer).
229+
- [ ] Test complex templates with nesting, various tags, filters, etc.
230+
- [ ] Test template inheritance scenarios via the LSP layer integration.
231+
- [ ] Benchmark performance of each phase and the end-to-end process. Compare against the old one-pass approach.
232+
- [ ] Identify and optimize bottlenecks, particularly for incremental updates.
233+
- [ ] Document performance characteristics and trade-offs.
234+
235+
## Progressive Implementation Strategy
236+
237+
(This can remain largely the same as the original plan)
238+
239+
1. **Initial Development Phase**: Implement the Syntactic Parser and Semantic Analyzer alongside the existing parser, using a feature flag or configuration option to switch.
240+
2. **Testing Phase**: Run both the old and new pipelines on test suites, compare AST outputs (where possible) and error reporting, fix discrepancies.
241+
3. **Transition Phase**: Default to the original parser but allow opt-in to the new phased pipeline via configuration. Gather feedback.
242+
4. **Completion Phase**: Make the new phased pipeline the default. Deprecate and eventually remove the old one-pass parser code.
243+
244+
## Detailed Progress Checklist
245+
246+
(Update checklist items based on the revised phase names and tasks described above)
247+
248+
### Phase 1: Define Syntax Tree Structure
249+
- [ ] Define `SimpleNode` enum...
250+
- [ ] Define `NodeList` struct...
251+
... (etc.)
252+
253+
### Phase 2: Implement Syntactic Parser
254+
- [ ] Create `SyntacticParser` module/struct...
255+
- [ ] Implement `TokenStream` to `NodeList` conversion...
256+
- [ ] Add syntax error collection...
257+
... (etc.)
258+
259+
### Phase 3: Implement Single-File Semantic Analyzer
260+
- [ ] Create `SemanticAnalyzer` module/struct...
261+
- [ ] Implement `NodeList` processing...
262+
- [ ] Add `TagSpecs` integration...
263+
... (etc.)
264+
265+
*(Continue updating checklists for Phases 4-8 similarly)*
266+
267+
## Notes and LSP Considerations
268+
269+
- **Clear Phasing**: The Syntax -> Single-File Semantics -> Cross-File Semantics phasing provides a clean workflow, isolating different levels of complexity.
270+
- **Responsiveness**: The primary LSP benefit comes from the fast **Parser (Syntax Layer)** providing quick syntax validation. Semantic analysis can potentially run asynchronously or with delays.
271+
- **Memory Usage**: Storing the intermediate Syntax Tree (`NodeList`) plus the final rich `Ast` will increase memory usage compared to the one-pass approach. Monitor impact.
272+
- **Incremental Updates**: Key advantage for LSP. Changes trigger re-parsing (Phase 1) locally, potentially followed by targeted re-analysis (Phase 2) of affected nodes/sub-trees. Requires careful dependency tracking.
273+
- **Error Resilience**: The Parser can produce a useful Syntax Tree even if semantic errors exist later. Errors are clearly tied to the phase (Syntax, Semantic) that found them.
274+
- **Template Inheritance**: Explicitly handled in the final, cross-file semantic phase (LSP Layer), keeping the core parser and single-file analyzer focused.
275+
- **Custom Tags**: `{% load %}` identified by the Parser; tag definitions applied by the Semantic Analyzer using updated `TagSpecs`.
276+
- **Performance Monitoring**: Crucial to benchmark phase timings, especially for incremental updates, to ensure LSP responsiveness goals are met.

crates/djls-template-ast/src/error.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,7 @@ impl From<std::io::Error> for TemplateError {
4444
impl TemplateError {
4545
pub fn span(&self) -> Option<Span> {
4646
match self {
47-
TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => {
48-
Some(*span)
49-
}
47+
TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => Some(*span),
5048
_ => None,
5149
}
5250
}

crates/djls-template-ast/src/lib.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ mod lexer;
44
mod parser;
55
mod tagspecs;
66
mod tokens;
7-
mod validator;
87

98
use ast::NodeList;
109
pub use error::{to_lsp_diagnostic, QuickFix, TemplateError};

crates/djls-template-ast/src/validator.rs

Whitespace-only changes.

0 commit comments

Comments
 (0)