-
Notifications
You must be signed in to change notification settings - Fork 8
Description
This is part 2 of our efforts to support this functionality properly. Part 1 is done in #396
This is almost certainly possible and it looks like a major, but fun, enhancement. I asked Claude 4 Sonnet for an effort estimate:
Current Architecture Analysis
The plugin currently has several characteristics that make partial rebuilds challenging:
Global Collection State: The
TraceableCollectionis completely rebuilt on every build cycle (see315:324:mlx/traceability/traceability.py), clearing all items and relationships.Cross-Document Dependencies: Items can reference each other across documents through relationships like
depends_on,implements, etc. When relationships change, multiple documents may need updates.Automatic Reverse Relations: The plugin automatically creates reverse relationships (e.g., if A
depends_onB, then B automatically getsimpacts_onA). This cross-document coupling is a major challenge for partial rebuilds.Processing Pipeline: The plugin uses a three-stage process:
- Parse and collect items during document reading
- Consistency check and cross-reference resolution (
197:230:mlx/traceability/traceability.py)- Node replacement during document resolution (
233:262:mlx/traceability/traceability.py)Effort Assessment for Partial Rebuild Support
Medium Effort (2-4 weeks) - "Good Enough" Solution
This approach would provide basic partial rebuild support with some limitations:
Key Changes Required:
Persistent Item Storage: Instead of clearing
env.traceability_collectionon every build, implement incremental updates:# Instead of: env.traceability_collection = TraceableCollection() # Use: env.traceability_collection.update_from_changed_docs(changed_docs)Dependency Tracking: Track which documents contain items that reference each other:
# Add to TraceableCollection self.doc_dependencies = {} # doc -> set of docs that reference items in this docSmart Invalidation: When a document changes, invalidate only documents that:
- Contain items referenced by the changed document
- Reference items in the changed document
- Display matrices/lists that include items from the changed document
Relationship Scope Tracking: Track which relationships cross document boundaries to minimize invalidation scope.
Limitations of this approach:
- Item ID changes might not be fully handled
- Some edge cases with complex relationship chains might require full rebuilds
- Matrix/list directives would need conservative invalidation
High Effort (1-2 months) - Comprehensive Solution
This would provide robust partial rebuild support:
Additional Changes:
Fine-grained Dependency Graph: Build a complete dependency graph tracking:
- Item-to-item relationships across documents
- Which directives depend on which items
- Transitive dependency chains
Change Detection: Implement sophisticated change detection:
class ItemChangeTracker: def detect_changes(self, old_item, new_item): # Detect ID changes, attribute changes, relationship changes # Return set of affected documentsIncremental Cross-Reference Resolution: Instead of global consistency checks, implement incremental updates to cross-references.
Document Markup Caching: Cache the processed directive nodes and only regenerate when dependencies change.
Recommended Approach
For your use case (colleagues wanting faster builds to check reStructuredText syntax), I'd recommend the Medium Effort approach because:
Pragmatic Trade-off: It provides significant speed improvements for most common scenarios while being implementable in a reasonable timeframe.
Fallback Strategy: You can implement a "force full rebuild" option for edge cases, similar to Sphinx's
-Eflag.Incremental Implementation: You can start with basic document-level dependency tracking and refine it over time.
Implementation Strategy
Here's a high-level implementation plan:
# 1. Modify initialize_environment to preserve existing collection def initialize_environment(app): env = app.builder.env if not hasattr(env, 'traceability_collection'): env.traceability_collection = TraceableCollection() # Only clear items from changed documents env.traceability_collection.update_for_build(env.found_docs, env.updated_docs) # 2. Add dependency tracking to TraceableCollection class TraceableCollection: def __init__(self): # ... existing code ... self.doc_item_map = {} # doc -> set of item IDs self.item_doc_map = {} # item ID -> doc self.cross_doc_refs = {} # doc -> set of docs it references # 3. Implement smart invalidation def should_rebuild_doc(doc, changed_docs, collection): if doc in changed_docs: return True # Check if any referenced items changed for ref_doc in collection.cross_doc_refs.get(doc, set()): if ref_doc in changed_docs: return True return FalseThe key insight is that most traceability relationships are fairly localized within document clusters, so a smart invalidation strategy can provide significant speedups while maintaining correctness for the majority of use cases.
Originally posted by @JasperCraeghs in #385