The Upgrade Purge Strategy is an active lifecycle management system for the wisdom database. It treats "Wisdom" like a high-performance cache rather than a cold storage archive. The key insight: lessons are often just band-aids for a model's weaknesses, and when you upgrade the base model, many of those band-aids become redundant.
- AI models continuously improve (GPT-3.5 → GPT-4 → GPT-4.5, etc.)
- Each upgrade fixes many of the model's previous weaknesses
- Lessons learned to compensate for old weaknesses become obsolete
- Without lifecycle management, the wisdom database grows forever with stale lessons
When upgrading the underlying model:
- Audit: Take the "Failure Scenarios" that generated old lessons
- Test: Run them against the New Model without the extra context
- Purge: If the New Model solves it natively, DELETE the old lesson
- Result: Database gets smaller and more specialized over time
- Removes redundant lessons automatically
- Keeps only edge cases the new model can't handle
- More efficient use of context window
- No manual cleanup required
- Automatically adapts to model improvements
- Continuous refinement over time
- Only specialized lessons remain
- Clearer guidance for the agent
- Faster context loading
- Smaller context = lower token costs
- Faster inference times
- More maintainable system
┌─────────────────────────────────────────────────────────┐
│ NORMAL OPERATION (Learning) │
├─────────────────────────────────────────────────────────┤
│ Agent encounters failures → Learns lessons │
│ Wisdom database grows: 10 → 30 → 50 lessons │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ MODEL UPGRADE (Purge Trigger) │
├─────────────────────────────────────────────────────────┤
│ You: "I'm upgrading from GPT-3.5 to GPT-4" │
│ System: "Let me audit your wisdom database..." │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ AUDIT PHASE │
├─────────────────────────────────────────────────────────┤
│ For each lesson in database: │
│ 1. Extract the original failure scenario │
│ 2. Test against GPT-4 WITHOUT the lesson │
│ 3. Score the new model's performance │
│ │
│ Results: │
│ ✅ Lesson 1: Score 0.9 → REDUNDANT (can purge) │
│ ✅ Lesson 2: Score 0.85 → REDUNDANT (can purge) │
│ ⚠️ Lesson 3: Score 0.6 → CRITICAL (keep it) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PURGE PHASE │
├─────────────────────────────────────────────────────────┤
│ Delete redundant lessons (1, 2) │
│ Keep critical lessons (3) │
│ Update wisdom database │
│ Record purge in history │
│ │
│ Result: 50 lessons → 20 lessons (60% reduction) ✅ │
└─────────────────────────────────────────────────────────┘
from model_upgrade import ModelUpgradeManager
# Initialize the upgrade manager
manager = ModelUpgradeManager()
# Define baseline instructions for the new model
baseline_instructions = """You are a helpful AI assistant. Your goal is to
provide accurate and useful responses to user queries. You have access to
tools that you can use to help answer questions. Always think step-by-step
and provide clear, concise answers."""
# Perform upgrade audit and purge
report = manager.perform_upgrade(
new_model="gpt-4o", # The new model you're upgrading to
baseline_instructions=baseline_instructions,
score_threshold=0.8, # Threshold for "solved natively"
auto_purge=True, # Automatically purge redundant lessons
verbose=True # Show detailed progress
)
# Review results
print(f"Purged: {report['purge_results']['purged_count']} lessons")
print(f"Remaining: {report['purge_results']['remaining_count']} lessons")# Step 1: Audit only (review before purging)
audit_results = manager.audit_wisdom_database(
new_model="gpt-4o",
baseline_instructions=baseline_instructions,
score_threshold=0.8,
verbose=True
)
# Step 2: Review recommendations
print(f"Redundant lessons: {len(audit_results['redundant_lessons'])}")
print(f"Critical lessons: {len(audit_results['critical_lessons'])}")
# Step 3: Manually purge after review
purge_results = manager.purge_redundant_lessons(
audit_results=audit_results,
verbose=True
)The MemorySystem now stores query and response alongside each improvement:
memory.update_instructions(
new_instructions="...",
critique="...",
query="What is 15 * 24?", # Original query that failed
response="Let me calculate..." # Agent's failed response
)This context is essential for the audit phase to test scenarios.
The ObserverAgent automatically captures this context during learning:
self.wisdom.update_instructions(
new_instructions,
analysis["critique"],
query=event.query, # From telemetry
response=event.agent_response # From telemetry
)The upgrade purge also cleans up the prioritization framework's safety corrections, removing corrections that are no longer needed.
The system maintains an upgrade log (model_upgrade_log.json) that tracks:
- When upgrades occurred
- Which model was used
- How many lessons were purged
- Historical reduction over time
{
"current_model": "gpt-4o",
"upgrades": [
{
"timestamp": "2024-01-15T10:30:00",
"new_model": "gpt-4o",
"purged_count": 30,
"remaining_count": 20
}
]
}The score_threshold parameter (default: 0.8) determines when a lesson is considered redundant:
- Lower threshold (0.6): More aggressive purging, keeps fewer lessons
- Higher threshold (0.9): More conservative, keeps more lessons
- Recommended: 0.8 (80% quality threshold)
Provide clear, concise baseline instructions that represent the new model's foundation. These should NOT include specific lessons - just the core agent behavior.
- Start: 0 lessons
- Learn from failures: +50 lessons
- End: 50 lessons
- Start: 50 lessons
- Learn from failures: +30 lessons
- End: 80 lessons
- Start: 80 lessons
- Audit: Test all lessons against GPT-4
- Purge: Remove 40 redundant lessons
- End: 40 lessons ✅ (50% reduction)
- Start: 40 lessons
- Learn from failures: +15 lessons
- End: 55 lessons
- Start: 55 lessons
- Audit: Test all lessons against GPT-4.5
- Purge: Remove 30 redundant lessons
- End: 25 lessons ✅ (45% reduction)
Key Insight: The database continuously refines itself, maintaining only the most specialized edge cases.
Run the upgrade purge whenever you change your base model (even minor version updates).
For production systems, use auto_purge=False initially to review audit results before purging.
Keep a backup of system_instructions.json before running the purge, just in case.
Track the purge_history in your wisdom database to understand lifecycle trends.
- Non-critical systems: Use lower threshold (0.7) for aggressive purging
- Critical systems: Use higher threshold (0.9) to be conservative
Run the test suite to validate the upgrade purge functionality:
python test_model_upgrade.pyTry the example demonstration:
python example_upgrade_purge.pyYou can customize the evaluation logic by subclassing ModelUpgradeManager:
class CustomUpgradeManager(ModelUpgradeManager):
def _evaluate_response(self, query: str, agent_response: str) -> float:
# Custom evaluation logic
# Return score 0.0 - 1.0
passThis is normal if:
- The new model isn't significantly better
- Your lessons are already specialized
- Threshold is too high
This suggests:
- New model is much better
- Threshold might be too low
- Lessons were too generic
Audit time scales with:
- Number of lessons
- Model response time
- Network latency
Consider batching or caching evaluations for large databases.
Potential improvements to the upgrade purge strategy:
- Semantic Similarity: Use embeddings to detect similar failure patterns
- Batch Processing: Test multiple scenarios in parallel
- A/B Testing: Compare new model with/without lessons
- Cost Analysis: Calculate cost savings from purging
- Automatic Triggers: Auto-detect model upgrades and suggest purges
- README.md - Main documentation
- ARCHITECTURE_DECOUPLED.md - Decoupled architecture
- PRIORITIZATION_FRAMEWORK.md - Context prioritization
- model_upgrade.py - Implementation source code
- test_model_upgrade.py - Test suite
- example_upgrade_purge.py - Usage example
Remember: Treat your wisdom database like a high-performance cache, not a cold storage archive. Purge regularly to keep it lean, focused, and effective!