-
-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
enhancementNew feature or requestNew feature or request⨠feature requestNew feature requestNew feature request
Description
π¦ Duck Enhancement Proposal
π‘ The Problem
Silent LLM API updates are a real risk. One day your prompt works perfectly; the next day it rambles or refuses innocuous requests. Nobody offers "CI for LLM behavior" in MCP.
π Proposed Solution
// Store a test case
duck_regression_add({
name: "code_review_format",
prompt: "Review this function: function add(a,b) { return a+b }",
provider: "openai",
expected_behavior: {
contains: ["return type", "parameter types"],
not_contains: ["error", "cannot"],
max_length: 500,
sentiment: "constructive"
}
})
// Run regression tests
duck_regression_run({
suite: "code_review", // or "all"
threshold: 0.8 // 80% similarity to baseline
})
// Returns
{
passed: 4,
failed: 1,
drifted: [
{
name: "code_review_format",
baseline_date: "2025-01-15",
similarity: 0.62,
changes: ["Now includes emoji", "Missing type suggestions"],
recommendation: "Update baseline or investigate model change"
}
]
}π¦ Duck Use Cases
- Detect when provider silently updates their model
- Ensure prompts still work after config changes
- CI/CD integration for prompt engineering
π Implementation
src/services/regression.ts- Test storage and comparisonsrc/tools/duck-regression.ts- add/run/list/baseline tools- Storage: JSON file in
~/.mcp-rubber-duck/regression/ - Comparison: Semantic similarity + rule-based checks
π Research Backing
- Evidently AI - LLM monitoring patterns
- Prompt regression testing
- "Silent LLM API updates" are documented risk
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request⨠feature requestNew feature requestNew feature request