feat: Duck Regression Test - CI for LLM behavior drift detection

## 🦆 Duck Enhancement Proposal

### 💡 The Problem
Silent LLM API updates are a real risk. One day your prompt works perfectly; the next day it rambles or refuses innocuous requests. Nobody offers "CI for LLM behavior" in MCP.

### 🚀 Proposed Solution

```typescript
// Store a test case
duck_regression_add({
  name: "code_review_format",
  prompt: "Review this function: function add(a,b) { return a+b }",
  provider: "openai",
  expected_behavior: {
    contains: ["return type", "parameter types"],
    not_contains: ["error", "cannot"],
    max_length: 500,
    sentiment: "constructive"
  }
})

// Run regression tests
duck_regression_run({
  suite: "code_review",  // or "all"
  threshold: 0.8  // 80% similarity to baseline
})

// Returns
{
  passed: 4,
  failed: 1,
  drifted: [
    {
      name: "code_review_format",
      baseline_date: "2025-01-15",
      similarity: 0.62,
      changes: ["Now includes emoji", "Missing type suggestions"],
      recommendation: "Update baseline or investigate model change"
    }
  ]
}
```

### 🦆 Duck Use Cases
- Detect when provider silently updates their model
- Ensure prompts still work after config changes
- CI/CD integration for prompt engineering

### 📋 Implementation
1. `src/services/regression.ts` - Test storage and comparison
2. `src/tools/duck-regression.ts` - add/run/list/baseline tools
3. Storage: JSON file in `~/.mcp-rubber-duck/regression/`
4. Comparison: Semantic similarity + rule-based checks

### 🌟 Research Backing
- [Evidently AI](https://github.com/evidentlyai/evidently) - LLM monitoring patterns
- [Prompt regression testing](https://www.statsig.com/perspectives/slug-prompt-regression-testing)
- "Silent LLM API updates" are documented risk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Duck Regression Test - CI for LLM behavior drift detection #43

🦆 Duck Enhancement Proposal

💡 The Problem

🚀 Proposed Solution

🦆 Duck Use Cases

📋 Implementation

🌟 Research Backing

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat: Duck Regression Test - CI for LLM behavior drift detection #43

Description

🦆 Duck Enhancement Proposal

💡 The Problem

🚀 Proposed Solution

🦆 Duck Use Cases

📋 Implementation

🌟 Research Backing

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions