Skip to content

docs(design): add maintainer decision layer roadmap#35

Draft
frankekn wants to merge 5 commits intopwrdrvr:mainfrom
frankekn:frank/docs-maintainer-decision-roadmap
Draft

docs(design): add maintainer decision layer roadmap#35
frankekn wants to merge 5 commits intopwrdrvr:mainfrom
frankekn:frank/docs-maintainer-decision-roadmap

Conversation

@frankekn
Copy link
Copy Markdown

@frankekn frankekn commented Mar 27, 2026

Summary

Add a design roadmap for a maintainer decision layer above ghcrawl's existing search and clustering pipeline.

Why

ghcrawl already does repository-wide semantic grouping well. The remaining gap is maintainer-facing decision support: helping answer which nearby PR is the strongest base, which variant is likely superseded, and which semantic neighbors should stay excluded.

This PR does not introduce code or change current runtime behavior. It documents a clean additive direction so future feature work can converge on one architecture instead of growing as disconnected heuristics.

Changes

  • add docs/designs/maintainer-decision-layer.md
  • extend docs/PLAN.md with a dedicated maintainer decision-analysis phase

Design Intent

The proposed direction is additive:

  • keep the current search and cluster pipeline
  • reuse clusters and semantic neighbors as candidate retrieval
  • add a reusable decision-analysis layer above retrieval
  • expose that layer through analyze-pr, triage, API, and future UI surfaces

Architecture

flowchart LR
    A[Existing ghcrawl pipeline] --> B[Neighbors / Clusters]
    B --> C[Candidate retrieval]
    C --> D[Decision analysis]
    D --> E[Maintainer outputs]

    E --> E1[analyze-pr]
    E --> E2[triage]
    E --> E3[API]

    style D fill:#173042,stroke:#7dd3fc,stroke-width:2px,color:#ffffff
    style E fill:#3a2812,stroke:#fbbf24,stroke-width:2px,color:#ffffff
Loading

Initial Scoring Direction

One thing I wanted to make explicit in this roadmap is that the future decision layer should not be “more clustering”. It should be a second-stage score model that combines the strongest maintainer-facing signals:

  • semantic similarity
  • linked issue overlap
  • path relevance
  • companion test relevance
  • state / recency
  • unrelated churn penalty

In other words:

  • clusters and semantic neighbors are the recall layer
  • the weighted score model is the decision layer
  • explicit roles such as best_base, superseded_candidate, and excluded_neighbor are the maintainer-facing result

That distinction is the main reason this proposal exists. The goal is not to treat cluster membership itself as the final answer. The goal is to reuse the current retrieval and clustering pipeline, then add a maintainer-oriented decision pass on top of it.

flowchart LR
    A[semantic similarity]
    B[linked issue overlap]
    C[path relevance]
    D[companion test relevance]
    E[state and recency]
    F[noise penalty]

    A --> G[decision score]
    B --> G
    C --> G
    D --> G
    E --> G
    F --> G

    G --> H[best_base]
    G --> I[same_cluster_candidate]
    G --> J[superseded_candidate]
    G --> K[excluded_neighbor]

    style G fill:#173042,stroke:#7dd3fc,stroke-width:2px,color:#ffffff
Loading

The updated design doc keeps the score model concrete at the level that matters for a roadmap:

  • explicit feature families
  • retrieval provenance separated from decision role
  • a deterministic latest-snapshot boundary for v1
  • adjacent decision tables later instead of coupling decisions into cluster snapshots
  • early fixture-based evaluation before tuning widens

Exact score weights are intentionally left to implementation and fixture-driven tuning rather than being treated as roadmap truth. The weighting approach itself is not hypothetical: claw-maintainer-tui already runs a weighted maintainer decision model today, and the first ghcrawl implementation can start from that profile before retuning against local fixtures.

For clarity, this is not a “stale local data only” proposal. V1 should analyze the latest local repository snapshot produced by the existing explicit refresh or sync -> embed -> cluster pipeline. Freshness remains an explicit operational step, while the scoring pass itself stays free of hidden live GitHub or OpenAI fetches.

Delivery Path

flowchart TD
    P0[Phase 0<br/>fix current contracts]
    P1[Phase 1<br/>reusable decision core]
    P2[Phase 2<br/>analyze-pr]
    P3[Phase 3<br/>triage and API reuse]
    P4[Phase 4<br/>decision artifacts on run state]

    P0 --> P1 --> P2 --> P3 --> P4
Loading

Non-Goals

  • no cluster-model replacement
  • no storage redesign in the first iteration
  • no embedding backend changes
  • no runtime behavior change in this PR

Testing

  • docs-only change
  • manually reviewed markdown structure and Mermaid diagrams for GitHub rendering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant