lint

joshuadavidthomas · joshuadavidthomas · commit d13ea7f65a64 · 2025-04-19T23:15:23.000-05:00
diff --git a/.rustfmt.toml b/.rustfmt.toml
@@ -1,2 +0,0 @@
-imports_granularity = "Item"
-unstable_features = true
diff --git a/NEWPLAN.md b/NEWPLAN.md
@@ -0,0 +1,276 @@
+# Django Template Language Processing: Phased Analysis Approach
+
+## Current State Assessment
+
+Currently, the Django template parser in `crates/djls-template-ast` uses a one-pass parsing approach:
+
+1.  **Lexical Analysis:** The `Lexer` tokenizes the input template into a `TokenStream`.
+2.  **Combined Syntactic & Semantic Analysis:** The `Parser` processes the tokens and directly generates a rich AST (`Ast`) with `Node` objects. This single pass simultaneously:
+    *   Determines syntactic structure (recognizing tags, variables, text).
+    *   Applies semantic meaning based on `TagSpecs` (understanding container/branch/closing relationships, tag arguments).
+    *   Builds the final nested tree structure.
+3.  **Validation:** After parsing, a separate `Validator` runs on the rich AST to check for *additional* errors (some semantic checks might have already happened during the parse).
+
+The current implementation mixes syntactic structure recognition with semantic interpretation within a single parsing step.
+
+## Goal
+
+Refactor the language processing pipeline to clearly separate syntactic analysis from semantic analysis, aligning with standard compiler/interpreter design principles.
+
+1.  **Phase 1: Syntactic Analysis (Parsing)**
+    *   Process the `TokenStream` to validate the grammatical structure of the template.
+    *   Produce a representation of the raw syntax, like a **Syntax Tree** (potentially a flat `NodeList` of simple nodes), *without* interpreting tag meanings or relationships.
+    *   Focus solely on whether the token sequence forms a valid Django template structure according to the language grammar.
+
+2.  **Phase 2: Semantic Analysis (Single-File)**
+    *   Process the Syntax Tree generated by the Parser.
+    *   Apply `TagSpecs` to understand the *meaning* and behavior of tags.
+    *   Resolve tag relationships (container, branch, closing) within the single file.
+    *   Analyze and validate tag arguments and variable filters.
+    *   Build the final, rich, semantically-aware AST with proper nesting and semantic information.
+
+3.  **Phase 3: Semantic Analysis (Cross-File / Project)**
+    *   Analyze multiple single-file ASTs in the context of a project.
+    *   Handle template inheritance (`extends`, `include`, `block`) by building an inheritance graph.
+    *   Perform cross-file validations and provide cross-file code intelligence features (handled primarily by the LSP Server).
+
+## Architecture: Phased Analysis Pipeline
+
+We will structure the processing pipeline based on distinct analysis phases:
+
+1.  **Syntax Layer (Parser)**:
+    *   **Lexical Analysis:** `Source -> TokenStream` (Existing `Lexer`).
+    *   **Syntactic Analysis:** `TokenStream -> Syntax Tree` (e.g., `NodeList` of `SimpleNode`). Focuses on grammar and structure only. Generates syntax errors.
+
+2.  **Single-File Semantics Layer (Semantic Analyzer)**:
+    *   **Semantic Analysis:** `Syntax Tree -> Rich AST` (Existing `Ast` structure). Applies `TagSpecs`, resolves tag relationships, validates arguments/filters within one file. Builds the nested AST. Generates single-file semantic errors.
+
+3.  **Cross-File Semantics Layer (LSP Server / Project Analyzer)**:
+    *   **Project-Level Analysis:** `Multiple ASTs -> Inheritance Graph / Cross-File Insights`. Handles `extends`, `include`, `block` resolution across files. Performs cross-file validation. Generates cross-file semantic errors.
+
+This phased approach separates concerns effectively, mirroring how compilers and interpreters process code.
+
+## Rationale for Phased Analysis
+
+### Benefits for Django Template Processing
+
+1.  **Alignment with Language Features**:
+    *   **Container Tags**: Correctly matching `if`/`endif`, `for`/`endfor` pairs and their branches (`elif`, `else`, `empty`) is naturally handled during semantic analysis after the basic structure is known.
+    *   **Custom Tag Libraries**: `{% load %}` directives can be identified syntactically by the Parser, with the actual tag definitions applied during Semantic Analysis.
+    *   **Filter Chains & Arguments**: Syntax can be parsed first, with validation and semantic processing (argument checking, filter resolution) deferred to Semantic Analysis.
+
+2.  **LSP-Specific Advantages**:
+    *   **Faster Syntactic Feedback**: The Parser (Syntax Layer) can run quickly, providing immediate feedback on basic syntax errors as the user types.
+    *   **Clearer Error Categorization**: Errors are naturally categorized by the phase that detects them (Syntax Errors vs. Semantic Errors vs. Cross-File Errors).
+    *   **Improved Handling of Incomplete Code**: The Parser can often produce a partial Syntax Tree even if semantic errors exist, allowing basic features (highlighting) to function.
+    *   **Enhanced Code Intelligence**: The dedicated Semantic Analysis phase builds a richer AST, enabling more accurate completions, hover information, and navigation within a file.
+
+3.  **Technical Benefits**:
+    *   **Separation of Concerns**: Clear distinction between validating grammar (Parser) and interpreting meaning (Semantic Analyzer).
+    *   **Simplified Error Recovery**: The Parser can focus on recovering from syntax errors without the complexity of semantic context.
+    *   **More Maintainable Code**: Isolating semantic logic (tag spec application, relationship resolution) makes the system easier to understand, modify, and extend.
+    *   **Potentially Better Incremental Performance**: Changes might only require re-running the Parser on a small section and then re-running Semantic Analysis only on the affected parts of the Syntax Tree.
+
+### Performance Considerations for LSP Context
+
+1.  **Incremental Processing Optimization**:
+    *   Small text changes might only require re-lexing and re-parsing a small portion of the source, updating the Syntax Tree locally.
+    *   Semantic Analysis can then be re-run, potentially only on the changed sub-tree and its ancestors/dependents.
+    *   Enables more granular invalidation and reprocessing.
+
+2.  **Lazy Evaluation**:
+    *   Semantic Analysis could potentially be executed lazily, only when features requiring the rich AST are invoked. Basic syntax checks use only the Parser's output.
+
+3.  **Caching Opportunities**:
+    *   The Syntax Tree (`NodeList`) output by the Parser is a potential caching point.
+    *   Results from Semantic Analysis (the rich AST) can also be cached.
+
+4.  **Asynchronous Processing**:
+    *   The fast Parser phase could run synchronously for immediate feedback, while the potentially slower Semantic Analysis phase(s) could run asynchronously.
+
+## Detailed Design
+
+### Syntax Tree Structure (`NodeList`)
+
+The output of the **Parser (Syntax Layer)** will be a `NodeList`, representing the basic syntactic structure. It's a flat, sequential list corresponding closely to the significant tokens:
+
+```rust
+// Output of the Parser (Syntax Layer)
+pub struct NodeList {
+    nodes: Vec<SimpleNode>,
+    line_offsets: LineOffsets, // Derived during lexing/parsing
+}
+
+// Represents a node recognized purely based on syntax
+pub enum SimpleNode {
+    Tag {
+        name: String,      // Syntactically identified tag name
+        content: String,   // Raw content inside {% ... %}
+        span: Span,
+    },
+    Variable {
+        content: String,   // Raw content inside {{ ... }}
+        span: Span,
+    },
+    Text {
+        content: String,
+        span: Span,
+    },
+    Comment {              // Only {# ... #} comments recognized here
+        content: String,
+        span: Span,
+    },
+    // Maybe other types like HtmlTag if needed for basic structure
+}
+```
+
+Key characteristics:
+1.  Represents output of *syntactic analysis* only.
+2.  `SimpleNode` variants contain raw content, minimal processing.
+3.  Flat list structure (or a very basic tree if preferred).
+4.  No semantic understanding (tag types, relationships, filters).
+
+### Semantic Analysis Process (Single-File)
+
+The **Semantic Analyzer (Single-File Semantics Layer)** takes the `NodeList` (Syntax Tree) as input and produces the rich `Ast`:
+
+1.  Iterate through the `NodeList`.
+2.  For `SimpleNode::Tag` nodes:
+    *   Look up the `tag.name` in the `TagSpecs`.
+    *   Based on the `TagSpec` (Container, Single, Inclusion):
+        *   **Container:** Find matching closing tags (`SimpleNode::Tag` with expected name) later in the `NodeList`. Identify intermediate branch tags. Recursively process nodes between the opening/closing/branch tags to build nested `Node::Block(Block::Container)` or `Node::Block(Block::Branch)`.
+        *   **Single:** Create a `Node::Block(Block::Single)`.
+        *   **Inclusion:** Create a `Node::Block(Block::Inclusion)`. Identify template name argument syntactically.
+    *   Parse and validate tag arguments (`tag.content`) according to `ArgSpec` in the `TagSpec`.
+3.  For `SimpleNode::Variable` nodes:
+    *   Parse the `variable.content` into variable bits and `DjangoFilter`s.
+    *   Validate filter syntax. (Actual filter existence/argument validation might involve `TagSpecs` or a separate filter registry).
+    *   Create a `Node::Variable`.
+4.  For `SimpleNode::Text` and `SimpleNode::Comment`: Create corresponding `Node::Text` / `Node::Comment`.
+5.  Assemble these rich `Node` objects into the final nested `Ast` structure.
+6.  Collect semantic errors encountered during this process (mismatched tags, invalid arguments, unknown filters, etc.).
+
+### Template Inheritance Handling (Cross-File Semantics)
+
+Template inheritance (`extends`, `blocks`, `includes`) is handled by the **LSP Server / Project Analyzer (Cross-File Semantics Layer)**:
+
+1.  The Semantic Analyzer (Single-File) identifies inheritance-related tags (`{% extends %}`, `{% block %}`, `{% include %}`) and represents them in the rich `Ast` like other tags (e.g., as `Block::Single` or `Block::Container`).
+2.  The LSP Server Layer:
+    *   Collects `Ast`s from all relevant project files.
+    *   Builds an inheritance graph based on `extends` and `include` relationships found in the ASTs.
+    *   Resolves `block` overrides across the inheritance chain.
+    *   Provides cross-file validation (circular extends, missing templates/blocks).
+    *   Powers LSP features requiring cross-file knowledge.
+
+This separation aligns with Django's own rendering process where inheritance is resolved after individual templates are parsed.
+
+## Implementation Plan
+
+### Phase 1: Define Syntax Tree Structure
+- [ ] Define `SimpleNode` enum (Tag, Variable, Text, Comment).
+- [ ] Define `NodeList` struct holding `Vec<SimpleNode>` and `LineOffsets`.
+- [ ] Implement basic methods for `NodeList`.
+- [ ] Ensure `Span` information is accurately captured for `SimpleNode`s.
+- [ ] Add tests for the `NodeList` and `SimpleNode` structures.
+
+### Phase 2: Implement Syntactic Parser
+- [ ] Refactor `parser.rs` into a `SyntacticParser` (or similar name).
+- [ ] Implement logic to consume `TokenStream` and produce a `NodeList`.
+- [ ] Focus solely on recognizing syntactic structures (`{% .. %}`, `{{ .. }}`, etc.) and mapping them to `SimpleNode`s *without* using `TagSpecs`.
+- [ ] Handle basic syntax error detection (e.g., unclosed `{%`).
+- [ ] Implement error recovery to continue parsing and produce a partial `NodeList`.
+- [ ] Preserve accurate `Span` information from tokens to `SimpleNode`s.
+- [ ] Create unit tests specifically for the Syntactic Parser.
+
+### Phase 3: Implement Single-File Semantic Analyzer
+- [ ] Create a new `SemanticAnalyzer` struct/module.
+- [ ] Implement the logic to process an input `NodeList` (Syntax Tree).
+- [ ] Integrate `TagSpecs` lookup.
+- [ ] Implement algorithms for matching container/branch/closing tags based on `TagSpecs`.
+- [ ] Implement logic to parse/validate tag arguments based on `ArgSpec`.
+- [ ] Implement logic to parse/validate variable filters.
+- [ ] Build the final rich `Ast` tree structure with correct nesting.
+- [ ] Collect and report semantic errors found during analysis (e.g., missing `endif`, bad arguments).
+- [ ] Create unit tests for the Semantic Analyzer.
+
+### Phase 4: LSP-Specific Optimizations
+- [ ] Implement incremental parsing support for the Syntactic Parser (Phase 2).
+- [ ] Add caching for the `NodeList` (Syntax Tree).
+- [ ] Implement selective invalidation/reprocessing for the Semantic Analyzer (Phase 3) based on changes in the `NodeList`.
+- [ ] Explore lazy execution of Semantic Analysis for syntax-only operations.
+- [ ] Add performance benchmarks focused on editing scenarios and LSP request timings.
+
+### Phase 5: Implement Cross-File Semantic Analysis Framework (LSP Layer)
+- [ ] Design components within the LSP server for managing multiple ASTs.
+- [ ] Create data structures for the template inheritance graph.
+- [ ] Implement logic to build the graph by analyzing `extends`/`include` tags in ASTs.
+- [ ] Implement `block` resolution logic across the inheritance chain.
+- [ ] Create interfaces for LSP features (diagnostics, navigation) to query this cross-file information.
+- [ ] Implement tests for template inheritance resolution.
+
+### Phase 6: Error Handling Strategy
+- [ ] Define distinct error types for Syntax Errors (from Parser), Single-File Semantic Errors (from Analyzer), and Cross-File Errors (from LSP Layer).
+- [ ] Implement robust error collection mechanisms for each phase.
+- [ ] Ensure all errors retain accurate `Span` information traceable to the original source.
+- [ ] Provide clear error messages indicating the nature (syntax vs. semantic) and source of the error.
+- [ ] Implement error recovery in both the Parser and Semantic Analyzer.
+- [ ] Add tests specifically for error handling and recovery across phases.
+
+### Phase 7: Update Validation and Public API
+- [ ] Review/Update the existing `Validator` (`validator.rs`). Much of its logic will move into the Semantic Analyzer (Phase 3). Determine if a separate final validation step is still needed on the rich AST.
+- [ ] Update the public API in `lib.rs` (e.g., `parse_template`) to reflect the new internal pipeline. Options:
+    *   Expose only the final rich `Ast` and combined errors.
+    *   Optionally expose the intermediate `NodeList` (Syntax Tree) for tools needing only syntax info.
+- [ ] Maintain backward compatibility during transition if possible, or clearly document API changes.
+- [ ] Update documentation for the new architecture and API.
+
+### Phase 8: Testing and Performance Optimization
+- [ ] Create comprehensive integration tests covering the entire pipeline (Lexer -> Parser -> Semantic Analyzer).
+- [ ] Test complex templates with nesting, various tags, filters, etc.
+- [ ] Test template inheritance scenarios via the LSP layer integration.
+- [ ] Benchmark performance of each phase and the end-to-end process. Compare against the old one-pass approach.
+- [ ] Identify and optimize bottlenecks, particularly for incremental updates.
+- [ ] Document performance characteristics and trade-offs.
+
+## Progressive Implementation Strategy
+
+(This can remain largely the same as the original plan)
+
+1.  **Initial Development Phase**: Implement the Syntactic Parser and Semantic Analyzer alongside the existing parser, using a feature flag or configuration option to switch.
+2.  **Testing Phase**: Run both the old and new pipelines on test suites, compare AST outputs (where possible) and error reporting, fix discrepancies.
+3.  **Transition Phase**: Default to the original parser but allow opt-in to the new phased pipeline via configuration. Gather feedback.
+4.  **Completion Phase**: Make the new phased pipeline the default. Deprecate and eventually remove the old one-pass parser code.
+
+## Detailed Progress Checklist
+
+(Update checklist items based on the revised phase names and tasks described above)
+
+### Phase 1: Define Syntax Tree Structure
+- [ ] Define `SimpleNode` enum...
+- [ ] Define `NodeList` struct...
+... (etc.)
+
+### Phase 2: Implement Syntactic Parser
+- [ ] Create `SyntacticParser` module/struct...
+- [ ] Implement `TokenStream` to `NodeList` conversion...
+- [ ] Add syntax error collection...
+... (etc.)
+
+### Phase 3: Implement Single-File Semantic Analyzer
+- [ ] Create `SemanticAnalyzer` module/struct...
+- [ ] Implement `NodeList` processing...
+- [ ] Add `TagSpecs` integration...
+... (etc.)
+
+*(Continue updating checklists for Phases 4-8 similarly)*
+
+## Notes and LSP Considerations
+
+-   **Clear Phasing**: The Syntax -> Single-File Semantics -> Cross-File Semantics phasing provides a clean workflow, isolating different levels of complexity.
+-   **Responsiveness**: The primary LSP benefit comes from the fast **Parser (Syntax Layer)** providing quick syntax validation. Semantic analysis can potentially run asynchronously or with delays.
+-   **Memory Usage**: Storing the intermediate Syntax Tree (`NodeList`) plus the final rich `Ast` will increase memory usage compared to the one-pass approach. Monitor impact.
+-   **Incremental Updates**: Key advantage for LSP. Changes trigger re-parsing (Phase 1) locally, potentially followed by targeted re-analysis (Phase 2) of affected nodes/sub-trees. Requires careful dependency tracking.
+-   **Error Resilience**: The Parser can produce a useful Syntax Tree even if semantic errors exist later. Errors are clearly tied to the phase (Syntax, Semantic) that found them.
+-   **Template Inheritance**: Explicitly handled in the final, cross-file semantic phase (LSP Layer), keeping the core parser and single-file analyzer focused.
+-   **Custom Tags**: `{% load %}` identified by the Parser; tag definitions applied by the Semantic Analyzer using updated `TagSpecs`.
+-   **Performance Monitoring**: Crucial to benchmark phase timings, especially for incremental updates, to ensure LSP responsiveness goals are met.
diff --git a/crates/djls-template-ast/src/error.rs b/crates/djls-template-ast/src/error.rs
@@ -44,9 +44,7 @@ impl From<std::io::Error> for TemplateError {
 impl TemplateError {
     pub fn span(&self) -> Option<Span> {
         match self {
-            TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => {
-                Some(*span)
-            }
+            TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => Some(*span),
             _ => None,
         }
     }
diff --git a/crates/djls-template-ast/src/lib.rs b/crates/djls-template-ast/src/lib.rs
@@ -4,7 +4,6 @@ mod lexer;
 mod parser;
 mod tagspecs;
 mod tokens;
-mod validator;
 
 use ast::NodeList;
 pub use error::{to_lsp_diagnostic, QuickFix, TemplateError};
diff --git a/crates/djls-template-ast/src/validator.rs b/crates/djls-template-ast/src/validator.rs

Original file line number	Diff line number	Diff line change
`@@ -1,2 +0,0 @@`
`1`		`-imports_granularity = "Item"`
`2`		`-unstable_features = true`
Original file line number	Diff line number	Diff line change
`@@ -44,9 +44,7 @@ impl From<std::io::Error> for TemplateError {`
`44`	`44`	`impl TemplateError {`
`45`	`45`	`pub fn span(&self) -> Option<Span> {`
`46`	`46`	`match self {`
`47`		`- TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => {`
`48`		`- Some(*span)`
`49`		`- }`
	`47`	`+ TemplateError::Validation(AstError::InvalidTagStructure { span, .. }) => Some(*span),`
`50`	`48`	`_ => None,`
`51`	`49`	`}`
`52`	`50`	`}`