Skip to content

Commit 4160a06

Browse files
committed
update the docs
1 parent 9a5bd06 commit 4160a06

File tree

6 files changed

+1541
-81
lines changed

6 files changed

+1541
-81
lines changed

β€Ždocs/guides/gepa-optimization.mdβ€Ž

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,30 @@ super agent optimize assistant_microsoft --auto medium --framework microsoft --r
167167
super agent optimize research_agent_deepagents --auto medium --framework deepagents --reflection-lm ollama:llama3.1:8b # DeepAgents
168168
```
169169

170+
**πŸ’‘ About Reflection Models**
171+
172+
The `--reflection-lm` parameter specifies which model GEPA uses to analyze evaluation results and suggest prompt improvements. We typically recommend using a **smaller, faster model** for reflection:
173+
174+
**Why use a smaller reflection model (e.g., llama3.1:8b)?**
175+
- βœ… **Speed**: GEPA runs the reflection model many times (10-50+ iterations). Smaller models make optimization 5-10x faster
176+
- βœ… **Resources**: Reduces memory and compute requirements significantly
177+
- βœ… **Good Enough**: The reflection task (analyzing results, suggesting improvements) is simpler than the agent's actual task
178+
179+
**Example:**
180+
```bash
181+
# Your agent uses gpt-oss:20b (20B parameters)
182+
# But reflection uses llama3.1:8b (8B parameters) - much faster!
183+
super agent optimize my_agent --auto medium --reflection-lm ollama:llama3.1:8b
184+
```
185+
186+
**You can use a larger reflection model if needed:**
187+
```bash
188+
# For more sophisticated prompt improvements (slower)
189+
super agent optimize my_agent --auto medium --reflection-lm ollama:gpt-oss:70b
190+
```
191+
192+
---
193+
170194
**Step 3: Evaluate & Deploy**
171195

172196
```bash

β€Ždocs/guides/multi-framework.mdβ€Ž

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,13 @@ super agent evaluate my_agent
8282
# 4. Optimize with GEPA (works on ALL frameworks!)
8383
super agent optimize my_agent --auto medium --framework <framework> --reflection-lm ollama:llama3.1:8b
8484

85+
# πŸ’‘ Why --reflection-lm ollama:llama3.1:8b?
86+
# The reflection model runs many times during optimization to analyze results
87+
# and suggest improvements. Using a smaller, faster model (8b vs 20b/70b):
88+
# βœ… Speeds up optimization 5-10x
89+
# βœ… Reduces memory/resource usage
90+
# βœ… Provides good enough reflections (simpler task than the actual agent)
91+
8592
# 5. Re-evaluate
8693
super agent evaluate my_agent # automatically loads optimized weights
8794

@@ -609,8 +616,10 @@ spec:
609616
- [Evaluation & Testing](evaluation-testing.md)
610617
- [SuperSpec DSL](superspec.md)
611618
619+
### Tutorials
620+
621+
- [**OpenAI SDK + GEPA Optimization Tutorial**](../tutorials/openai-sdk-gepa-optimization.md) - Complete step-by-step guide to building custom agents with native OpenAI SDK patterns and optimizing them with GEPA
622+
612623
---
613624
614-
**Status**: All 6 frameworks production-ready βœ…
615-
**GEPA Support**: Universal optimization across all frameworks βœ…
616-
**Documentation**: Complete βœ…
625+
Ready to build your own optimized agent? Start with the [OpenAI SDK + GEPA Tutorial](../tutorials/openai-sdk-gepa-optimization.md)!

β€Ždocs/guides/openai-sdk-integration.mdβ€Ž

Lines changed: 147 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -213,15 +213,20 @@ super agent evaluate assistant_openai
213213

214214
**Expected Results:**
215215
```
216+
πŸ” Evaluating assistant_openai...
217+
Testing 4 BDD scenarios:
218+
219+
βœ… OpenAI Agents SDK initialized with Ollama: gpt-oss:20b
216220
βœ… Simple greeting: PASS
217221
βœ… Question answering: PASS
218222
βœ… Explanation request: PASS
219223
βœ… Math question: PASS
220224
221225
Overall: 4/4 PASS (100.0%)
222-
πŸ† Quality Gate: πŸŽ‰ EXCELLENT
223226
```
224227

228+
**Note**: Results depend on your model, hardware, and BDD scenario complexity. The agent loads optimized instructions automatically if available.
229+
225230
### Step 5: Optimize
226231

227232
```bash
@@ -497,21 +502,29 @@ pip install openai-agents
497502
OpenAI Agents SDK has one main optimizable variable:
498503
- **`instructions`**: The agent's system prompt
499504

500-
### Optimization Process
505+
### How GEPA Optimizes OpenAI SDK Agents
506+
507+
GEPA optimizes the **instructions** field by:
508+
509+
1. **Analyzing BDD test scenarios** to understand success criteria
510+
2. **Generating variations** of the instructions prompt
511+
3. **Testing each variation** against your evaluation scenarios
512+
4. **Selecting the best performer** based on pass rate
513+
514+
**Example transformation:**
501515

502-
**Before:**
503516
```yaml
517+
# Original (from playbook)
504518
persona:
505519
role: Helpful AI Assistant
506520
goal: Provide clear responses
507521
508522
β†’ instructions = "Helpful AI Assistant\nGoal: Provide clear responses"
509-
β†’ Baseline: Good performance (results vary by hardware/model)
510523
```
511524

512-
**After GEPA:**
513-
```
514-
instructions = "You are a Helpful AI Assistant.
525+
```yaml
526+
# After GEPA optimization
527+
β†’ instructions = "You are a Helpful AI Assistant.
515528
516529
When answering questions:
517530
1. Read the question carefully
@@ -520,33 +533,41 @@ When answering questions:
520533
4. Be concise but complete
521534
522535
Goal: Provide clear, helpful responses that directly address the user's query."
523-
524-
β†’ Optimized: Improved performance (results vary by hardware/model)
525536
```
526537

538+
GEPA typically expands the instructions to be more explicit and structured, which can improve agent behavior consistency.
539+
527540
---
528541

529-
## πŸ“ˆ Performance Results
542+
## πŸ“ˆ Performance Characteristics
530543

531544
### Baseline Performance
532545

533-
**Task:** General question answering
534-
**Model:** Ollama gpt-oss:20b
546+
**Task:** General question answering
547+
**Model:** Ollama gpt-oss:20b
535548
**Framework:** OpenAI Agents SDK
536549

537-
| Scenario | Baseline | After GEPA |
538-
|----------|----------|------------|
539-
| Simple greeting | βœ… PASS | βœ… PASS |
540-
| Question answering | βœ… PASS | βœ… PASS |
541-
| Explanation request | βœ… PASS | βœ… PASS |
542-
| Math question | βœ… PASS | βœ… PASS |
543-
| **Overall** | **100.0%** πŸ† | **100.0%** |
550+
OpenAI SDK typically achieves good baseline performance with local Ollama models. Results will vary based on:
551+
- Your hardware capabilities (RAM, CPU/GPU)
552+
- Model size and quality (8b vs 20b vs 120b)
553+
- BDD scenario complexity
554+
- Temperature and other model parameters
555+
556+
### Framework Comparison
544557

545-
**Key Insight:** OpenAI SDK achieves perfect baseline with Ollama!
558+
**OpenAI SDK strengths:**
559+
- Clean, simple API makes agents easier to understand
560+
- Works seamlessly with Ollama (no function-calling limitations)
561+
- Good baseline performance out of the box
546562

547-
This is significantly better than:
548-
- DSPy: 37.5% baseline (improves to ~55% with GEPA)
549-
- DeepAgents: Cannot test with Ollama (LangChain limitation)
563+
**DSPy strengths:**
564+
- More optimization targets (all signatures, not just instructions)
565+
- Better for focused, well-defined tasks
566+
- Greater improvement potential through optimization
567+
568+
**DeepAgents limitations:**
569+
- Requires cloud models (Claude/GPT-4) due to LangChain function-calling requirements
570+
- Cannot be tested with Ollama
550571

551572
---
552573

@@ -697,25 +718,27 @@ spec:
697718

698719
---
699720

700-
## πŸ“Š Performance Benchmarks
721+
## πŸ“Š Framework Trade-offs
722+
723+
### Model Support Comparison
701724

702-
### Baseline Comparison (Same BDD Scenarios)
725+
| Framework | Local Models (Ollama) | Cloud Models | Optimization Targets |
726+
|-----------|----------------------|--------------|---------------------|
727+
| **OpenAI SDK** | βœ… Full support | βœ… Yes | Instructions only |
728+
| **DSPy** | βœ… Full support | βœ… Yes | Multiple signatures |
729+
| **DeepAgents** | ❌ Limited* | βœ… Yes | System prompt |
703730

704-
| Framework | Model | Performance | Cost | Speed |
705-
|-----------|-------|-------------|------|-------|
706-
| **OpenAI SDK** | llama3.1:8b | Good | Free | Fast |
707-
| **DSPy** | llama3.1:8b | Good | Free | Fast |
708-
| **DSPy** | gpt-4 | 85% | $$$ | Medium |
709-
| **DeepAgents** | Claude | N/A | $$ | Medium |
731+
*DeepAgents has LangChain function-calling limitations with local models
710732

711-
### After GEPA Optimization
733+
### Cost & Development Speed
712734

713-
| Framework | Baseline | After GEPA | Improvement |
714-
|-----------|----------|------------|-------------|
715-
| **OpenAI SDK** | High | High | Moderate improvement |
716-
| **DSPy** | Good | Better | Significant improvement (results vary) |
735+
| Framework | Development Complexity | Ollama Cost | Cloud Cost |
736+
|-----------|----------------------|-------------|------------|
737+
| **OpenAI SDK** | Low (simple API) | Free | Variable |
738+
| **DSPy** | Medium (more concepts) | Free | Variable |
739+
| **DeepAgents** | High (planning graphs) | N/A | Variable |
717740

718-
**Key Insight:** OpenAI SDK achieves better baseline with Ollama!
741+
**Note:** Actual performance depends on your specific use case, model choice, and BDD scenarios. Always evaluate with your own data.
719742

720743
---
721744

@@ -812,32 +835,42 @@ This is based on the official OpenAI Agents SDK example for Ollama!
812835

813836
---
814837

815-
## πŸŽ‰ Success Stories
816-
817-
### Baseline Performance
818-
819-
**"Great results on the first evaluation!"**
838+
## 🎯 The SuperOptiX Multi-Framework Advantage
820839

821-
With simple, clear BDD scenarios and gpt-oss:20b model, the OpenAI SDK achieved perfect baseline performance. This demonstrates:
840+
### One Playbook, Multiple Frameworks
822841

823-
- Quality of OpenAI SDK design
824-
- Power of gpt-oss model
825-
- SuperOptiX multi-framework flexibility
826-
827-
### The SuperOptiX Advantage
828-
829-
**One playbook, three frameworks, all optimizable:**
842+
SuperOptiX allows you to write your agent specification once and compile to any supported framework:
830843

831844
```bash
832-
# Try with different frameworks
845+
# Same playbook, different frameworks
833846
super agent compile my_agent --framework dspy
834847
super agent compile my_agent --framework openai
835848
super agent compile my_agent --framework deepagents
836849
837-
# Same GEPA optimization works for all!
850+
# GEPA optimization works across all frameworks
838851
super agent optimize my_agent --auto medium
839852
```
840853

854+
### When to Use Each Framework
855+
856+
**Choose OpenAI SDK when:**
857+
- You want simple, straightforward agent design
858+
- You're using Ollama for local development
859+
- You need fast prototyping and iteration
860+
- Your use case is simple to moderate complexity
861+
862+
**Choose DSPy when:**
863+
- You need maximum optimization flexibility
864+
- You want to optimize multiple components (signatures)
865+
- You have well-defined, focused tasks
866+
- You want proven optimization improvements
867+
868+
**Choose DeepAgents when:**
869+
- You need complex planning capabilities
870+
- You're using cloud models (Claude/GPT-4)
871+
- You need filesystem context management
872+
- Your task requires sophisticated multi-step reasoning
873+
841874
---
842875

843876
## πŸ’‘ Tips & Best Practices
@@ -878,23 +911,26 @@ scenarios:
878911

879912
## ❓ FAQ
880913

881-
**Q: Why use OpenAI SDK instead of DSPy?**
882-
A: OpenAI SDK has simpler API and works well with Ollama out of the box. Use DSPy for maximum optimization flexibility. Performance varies by hardware and model.
914+
**Q: Why use OpenAI SDK instead of DSPy?**
915+
A: OpenAI SDK has a simpler, more straightforward API. It works well with Ollama out of the box. Choose DSPy when you need to optimize multiple components (signatures) or want maximum optimization flexibility.
916+
917+
**Q: Does it work with Ollama?**
918+
A: Yes! OpenAI SDK has full Ollama support. Unlike DeepAgents (which has LangChain function-calling limitations), OpenAI SDK works seamlessly with local models.
883919

884-
**Q: Does it work with Ollama?**
885-
A: Yes! Perfectly! Unlike DeepAgents, OpenAI SDK has no function-calling limitations.
920+
**Q: Can I use cloud models?**
921+
A: Yes! Configure your playbook with `provider: openai` and set the `OPENAI_API_KEY` environment variable. Supports OpenAI, Anthropic, and other providers.
886922

887-
**Q: Can I use cloud models?**
888-
A: Yes! Set `model: gpt-4.1` and `OPENAI_API_KEY` environment variable.
923+
**Q: Does GEPA optimize OpenAI SDK agents?**
924+
A: Yes! Universal GEPA optimizes the `instructions` field. While OpenAI SDK has fewer optimization targets than DSPy (which optimizes all signatures), GEPA can still improve performance by refining the agent instructions.
889925

890-
**Q: Does GEPA optimize OpenAI SDK agents?**
891-
A: Yes! Universal GEPA optimizes the `instructions` field just like any other framework.
926+
**Q: Can I use tools with OpenAI SDK agents?**
927+
A: Yes! Define tools in your playbook under `tools.specific_tools` and implement them using the `@function_tool` decorator in your pipeline code.
892928

893-
**Q: Can I use tools?**
894-
A: Yes! Define tools in playbook and implement with `@function_tool` decorator.
929+
**Q: What about multi-agent workflows?**
930+
A: OpenAI SDK supports multi-agent patterns through `handoffs`, where one agent can delegate to another. This is similar to CrewAI's crew concept but with a simpler API.
895931

896-
**Q: What about multi-agent?**
897-
A: Use `handoffs` for agent delegation. Works similar to CrewAI's crew concept.
932+
**Q: How does performance compare to other frameworks?**
933+
A: Performance varies by use case, model, and hardware. OpenAI SDK typically has good baseline performance with Ollama. Run your own evaluations with `super agent evaluate` to measure performance for your specific use case.
898934

899935
---
900936

@@ -907,21 +943,58 @@ A: Use `handoffs` for agent delegation. Works similar to CrewAI's crew concept.
907943

908944
---
909945

910-
## 🎊 Achievement Unlocked!
946+
## 🌐 Multi-Framework Summary
911947

912-
**SuperOptiX now supports THREE frameworks:**
913-
1. βœ… DSPy (Ollama compatible, max optimization)
914-
2. βœ… DeepAgents (planning & complexity, Claude/GPT-4 only)
915-
3. βœ… OpenAI SDK (simple & powerful, great Ollama support)
948+
**SuperOptiX supports 6 agent frameworks:**
949+
1. βœ… DSPy (maximum optimization, Ollama compatible)
950+
2. βœ… OpenAI SDK (simple API, excellent Ollama support)
951+
3. βœ… CrewAI (multi-agent teams, role-based collaboration)
952+
4. βœ… Google ADK (Gemini integration)
953+
5. βœ… Microsoft (Azure OpenAI, enterprise)
954+
6. βœ… DeepAgents (complex planning, Claude/GPT-4)
916955

917-
**All with:**
956+
**All frameworks share:**
918957
- Same SuperSpec YAML format
919-
- Same CLI workflow
920-
- Same GEPA optimization
921-
- Framework-specific strengths!
958+
- Same CLI workflow (`compile`, `evaluate`, `optimize`, `run`)
959+
- Same GEPA optimization engine
960+
- Framework-specific strengths preserved
961+
962+
**Learn more:** See the [Multi-Framework Guide](multi-framework.md) for comprehensive comparisons and examples.
963+
964+
---
965+
966+
## πŸš€ Getting Started
967+
968+
Ready to try OpenAI SDK with SuperOptiX?
969+
970+
```bash
971+
# Pull the demo agent
972+
super agent pull assistant_openai
973+
974+
# Start with Ollama (free, local)
975+
super agent run assistant_openai --goal "Hello!"
976+
```
922977

923978
---
924979

925-
*Try it now: `super agent pull assistant_openai` and experience great performance with Ollama!* πŸš€
980+
## πŸ“– Next Steps
981+
982+
Want to build your own custom agent with native OpenAI SDK patterns and optimize it with GEPA?
983+
984+
### πŸ”§ [OpenAI SDK + GEPA Optimization Tutorial](../tutorials/openai-sdk-gepa-optimization.md)
985+
986+
This comprehensive step-by-step tutorial teaches you how to:
987+
988+
βœ… Write agents using **official OpenAI Agents SDK patterns** (Agent, Runner, OpenAIChatCompletionsModel)
989+
βœ… Integrate your native SDK code with **SuperOptiX** for GEPA compatibility
990+
βœ… Define **BDD test scenarios** for measurable evaluation metrics
991+
βœ… Run **GEPA optimization** to automatically improve agent prompts
992+
βœ… Implement **automatic optimization loading** for production deployment
993+
994+
**Example project:** Code Reviewer Agent that detects security vulnerabilities
995+
996+
**Time:** 30-45 minutes | **Difficulty:** Intermediate | **Prerequisites:** Python, Ollama
997+
998+
πŸ‘‰ **[Start the tutorial now](../tutorials/openai-sdk-gepa-optimization.md)**
926999

9271000

0 commit comments

Comments
Β (0)