Skip to content

Proper support for incremental builds and partial rebuilds #398

@JasperCraeghs

Description

@JasperCraeghs

This is part 2 of our efforts to support this functionality properly. Part 1 is done in #396

This is almost certainly possible and it looks like a major, but fun, enhancement. I asked Claude 4 Sonnet for an effort estimate:

Current Architecture Analysis

The plugin currently has several characteristics that make partial rebuilds challenging:

  1. Global Collection State: The TraceableCollection is completely rebuilt on every build cycle (see 315:324:mlx/traceability/traceability.py), clearing all items and relationships.

  2. Cross-Document Dependencies: Items can reference each other across documents through relationships like depends_on, implements, etc. When relationships change, multiple documents may need updates.

  3. Automatic Reverse Relations: The plugin automatically creates reverse relationships (e.g., if A depends_on B, then B automatically gets impacts_on A). This cross-document coupling is a major challenge for partial rebuilds.

  4. Processing Pipeline: The plugin uses a three-stage process:

    • Parse and collect items during document reading
    • Consistency check and cross-reference resolution (197:230:mlx/traceability/traceability.py)
    • Node replacement during document resolution (233:262:mlx/traceability/traceability.py)

Effort Assessment for Partial Rebuild Support

Medium Effort (2-4 weeks) - "Good Enough" Solution

This approach would provide basic partial rebuild support with some limitations:

Key Changes Required:

  1. Persistent Item Storage: Instead of clearing env.traceability_collection on every build, implement incremental updates:

    # Instead of: env.traceability_collection = TraceableCollection()
    # Use: env.traceability_collection.update_from_changed_docs(changed_docs)
  2. Dependency Tracking: Track which documents contain items that reference each other:

    # Add to TraceableCollection
    self.doc_dependencies = {}  # doc -> set of docs that reference items in this doc
  3. Smart Invalidation: When a document changes, invalidate only documents that:

    • Contain items referenced by the changed document
    • Reference items in the changed document
    • Display matrices/lists that include items from the changed document
  4. Relationship Scope Tracking: Track which relationships cross document boundaries to minimize invalidation scope.

Limitations of this approach:

  • Item ID changes might not be fully handled
  • Some edge cases with complex relationship chains might require full rebuilds
  • Matrix/list directives would need conservative invalidation

High Effort (1-2 months) - Comprehensive Solution

This would provide robust partial rebuild support:

Additional Changes:

  1. Fine-grained Dependency Graph: Build a complete dependency graph tracking:

    • Item-to-item relationships across documents
    • Which directives depend on which items
    • Transitive dependency chains
  2. Change Detection: Implement sophisticated change detection:

    class ItemChangeTracker:
        def detect_changes(self, old_item, new_item):
            # Detect ID changes, attribute changes, relationship changes
            # Return set of affected documents
  3. Incremental Cross-Reference Resolution: Instead of global consistency checks, implement incremental updates to cross-references.

  4. Document Markup Caching: Cache the processed directive nodes and only regenerate when dependencies change.

Recommended Approach

For your use case (colleagues wanting faster builds to check reStructuredText syntax), I'd recommend the Medium Effort approach because:

  1. Pragmatic Trade-off: It provides significant speed improvements for most common scenarios while being implementable in a reasonable timeframe.

  2. Fallback Strategy: You can implement a "force full rebuild" option for edge cases, similar to Sphinx's -E flag.

  3. Incremental Implementation: You can start with basic document-level dependency tracking and refine it over time.

Implementation Strategy

Here's a high-level implementation plan:

# 1. Modify initialize_environment to preserve existing collection
def initialize_environment(app):
    env = app.builder.env
    if not hasattr(env, 'traceability_collection'):
        env.traceability_collection = TraceableCollection()
    # Only clear items from changed documents
    env.traceability_collection.update_for_build(env.found_docs, env.updated_docs)

# 2. Add dependency tracking to TraceableCollection
class TraceableCollection:
    def __init__(self):
        # ... existing code ...
        self.doc_item_map = {}  # doc -> set of item IDs
        self.item_doc_map = {}  # item ID -> doc
        self.cross_doc_refs = {}  # doc -> set of docs it references

# 3. Implement smart invalidation
def should_rebuild_doc(doc, changed_docs, collection):
    if doc in changed_docs:
        return True
    # Check if any referenced items changed
    for ref_doc in collection.cross_doc_refs.get(doc, set()):
        if ref_doc in changed_docs:
            return True
    return False

The key insight is that most traceability relationships are fairly localized within document clusters, so a smart invalidation strategy can provide significant speedups while maintaining correctness for the majority of use cases.

Originally posted by @JasperCraeghs in #385

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions