docs(design): add maintainer decision layer roadmap by frankekn · Pull Request #35 · pwrdrvr/ghcrawl

frankekn · 2026-03-27T09:00:32Z

Summary

Add a design roadmap for a maintainer decision layer above ghcrawl's existing search and clustering pipeline.

Why

ghcrawl already does repository-wide semantic grouping well. The remaining gap is maintainer-facing decision support: helping answer which nearby PR is the strongest base, which variant is likely superseded, and which semantic neighbors should stay excluded.

This PR does not introduce code or change current runtime behavior. It documents a clean additive direction so future feature work can converge on one architecture instead of growing as disconnected heuristics.

Changes

add docs/designs/maintainer-decision-layer.md
extend docs/PLAN.md with a dedicated maintainer decision-analysis phase

Design Intent

The proposed direction is additive:

keep the current search and cluster pipeline
reuse clusters and semantic neighbors as candidate retrieval
add a reusable decision-analysis layer above retrieval
expose that layer through analyze-pr, triage, API, and future UI surfaces

Architecture

flowchart LR
    A[Existing ghcrawl pipeline] --> B[Neighbors / Clusters]
    B --> C[Candidate retrieval]
    C --> D[Decision analysis]
    D --> E[Maintainer outputs]

    E --> E1[analyze-pr]
    E --> E2[triage]
    E --> E3[API]

    style D fill:#173042,stroke:#7dd3fc,stroke-width:2px,color:#ffffff
    style E fill:#3a2812,stroke:#fbbf24,stroke-width:2px,color:#ffffff

Initial Scoring Direction

One thing I wanted to make explicit in this roadmap is that the future decision layer should not be “more clustering”. It should be a second-stage score model that combines the strongest maintainer-facing signals:

semantic similarity
linked issue overlap
path relevance
companion test relevance
state / recency
unrelated churn penalty

In other words:

clusters and semantic neighbors are the recall layer
the weighted score model is the decision layer
explicit roles such as best_base, superseded_candidate, and excluded_neighbor are the maintainer-facing result

That distinction is the main reason this proposal exists. The goal is not to treat cluster membership itself as the final answer. The goal is to reuse the current retrieval and clustering pipeline, then add a maintainer-oriented decision pass on top of it.

flowchart LR
    A[semantic similarity]
    B[linked issue overlap]
    C[path relevance]
    D[companion test relevance]
    E[state and recency]
    F[noise penalty]

    A --> G[decision score]
    B --> G
    C --> G
    D --> G
    E --> G
    F --> G

    G --> H[best_base]
    G --> I[same_cluster_candidate]
    G --> J[superseded_candidate]
    G --> K[excluded_neighbor]

    style G fill:#173042,stroke:#7dd3fc,stroke-width:2px,color:#ffffff

The updated design doc keeps the score model concrete at the level that matters for a roadmap:

explicit feature families
retrieval provenance separated from decision role
a deterministic latest-snapshot boundary for v1
adjacent decision tables later instead of coupling decisions into cluster snapshots
early fixture-based evaluation before tuning widens

Exact score weights are intentionally left to implementation and fixture-driven tuning rather than being treated as roadmap truth. The weighting approach itself is not hypothetical: claw-maintainer-tui already runs a weighted maintainer decision model today, and the first ghcrawl implementation can start from that profile before retuning against local fixtures.

For clarity, this is not a “stale local data only” proposal. V1 should analyze the latest local repository snapshot produced by the existing explicit refresh or sync -> embed -> cluster pipeline. Freshness remains an explicit operational step, while the scoring pass itself stays free of hidden live GitHub or OpenAI fetches.

Delivery Path

flowchart TD
    P0[Phase 0<br/>fix current contracts]
    P1[Phase 1<br/>reusable decision core]
    P2[Phase 2<br/>analyze-pr]
    P3[Phase 3<br/>triage and API reuse]
    P4[Phase 4<br/>decision artifacts on run state]

    P0 --> P1 --> P2 --> P3 --> P4

Non-Goals

no cluster-model replacement
no storage redesign in the first iteration
no embedding backend changes
no runtime behavior change in this PR

Testing

docs-only change
manually reviewed markdown structure and Mermaid diagrams for GitHub rendering

frankekn added 5 commits March 27, 2026 16:57

docs: add maintainer decision layer roadmap

a2c3fe1

docs: add roadmap diagrams

177926e

docs: add decision scoring proposal

6bd3440

docs: refine decision-layer contract

06a50d1

docs: clarify roadmap boundaries and diagram contrast

fd0416c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(design): add maintainer decision layer roadmap#35

docs(design): add maintainer decision layer roadmap#35
frankekn wants to merge 5 commits intopwrdrvr:mainfrom
frankekn:frank/docs-maintainer-decision-roadmap

frankekn commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

frankekn commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Design Intent

Architecture

Initial Scoring Direction

Delivery Path

Non-Goals

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankekn commented Mar 27, 2026 •

edited

Loading