| marp | true |
|---|---|
| theme | vibeminds |
| paginate | true |
| style | /* Mermaid diagram styling */ .mermaid-container { display: flex; justify-content: center; align-items: center; width: 100%; margin: 0.5em 0; } .mermaid { text-align: center; } .mermaid svg { max-height: 280px; width: auto; } .mermaid .node rect, .mermaid .node polygon { rx: 5px; ry: 5px; } .mermaid .nodeLabel { padding: 0 10px; } /* Two-column layout */ .columns { display: flex; gap: 40px; align-items: flex-start; } .column-left { flex: 1; } .column-right { flex: 1; } .column-left .mermaid svg { min-height: 400px; height: auto; max-height: 500px; } /* Section divider slides */ section.section-divider { display: flex; flex-direction: column; justify-content: center; align-items: center; text-align: center; background: linear-gradient(135deg, #1a1a3e 0%, #4a3f8a 50%, #2d2d5a 100%); } section.section-divider h1 { font-size: 3.5em; margin-bottom: 0.2em; } section.section-divider h2 { font-size: 1.5em; color: #b39ddb; font-weight: 400; } section.section-divider p { font-size: 1.1em; color: #9575cd; margin-top: 1em; } |
An AI-Assisted Development Case Study
Using Claude Opus 4.5 with Claude Code
What is ElevenLabs and how we approached the SDK
ElevenLabs is an AI audio platform for realistic audio generation
- Text-to-Speech - Convert text to realistic speech with multiple voices
- Speech-to-Text - Transcribe audio with speaker diarization
- Speech-to-Speech - Voice conversion in real-time
- Sound Effects - Generate sound effects from text descriptions
- Music Composition - Generate music from text prompts
- Voice Design - Create custom AI voices with specific characteristics
- Real-Time APIs - WebSocket streaming + Twilio phone integration
Goal: Build a comprehensive Go SDK for AI audio and voice agents
| Category | Services |
|---|---|
| Core Audio | Text-to-Speech, Speech-to-Text, Sound Effects, Music |
| Voice | Voices, Voice Design, Models, Speech-to-Speech |
| Processing | Audio Isolation, Forced Alignment, Text-to-Dialogue |
| Content | Projects, Pronunciation, Dubbing |
| Real-Time | WebSocket TTS, WebSocket STT, Twilio, Phone Numbers |
| Utility | History, User |
OpenAPI Spec: 204 operations (~54K lines) | Generated Code: ~330K lines
Output: 44+ Go source files (~8K lines handwritten) + 19 test files
go-elevenlabs/
├── client.go # Main client with service accessors
├── texttospeech.go # Text-to-Speech service wrapper
├── speechtotext.go # Speech-to-Text + real-time STT
├── speechtospeech.go # Voice conversion service
├── websockettts.go # Real-time TTS streaming
├── websocketstt.go # Real-time STT streaming
├── twilio.go # Twilio + phone integration
├── music.go # Music composition + stem separation
├── ttsscript/ # TTS script authoring package
├── voices/ # Voice reference package
├── internal/api/ # ogen-generated API client (~330K lines)
└── docs/ # MkDocs documentation site (32 pages)
- Type-safe, no reflection
- Handles optional/nullable fields correctly
- Generated from OpenAPI spec (54K lines)
- Clean, idiomatic Go interface
- Hides ogen complexity from users
- Provides simplified method signatures
client, err := elevenlabs.NewClient(
elevenlabs.WithAPIKey("your-api-key"),
elevenlabs.WithTimeout(5 * time.Minute),
)Features, API Coverage, Testing & Documentation
Audio Generation
- Text-to-Speech
- Sound Effects
- Music
Transcription
- Speech-to-Text
- Forced Alignment
Voice
- Voices
- Voice Design
- Models
- Speech-to-Speech
Processing
- Audio Isolation
- Text-to-Dialogue
Real-Time
- WebSocket TTS ⚡
- WebSocket STT ⚡
- Twilio Integration
- Phone Numbers
Content
- Projects, Dubbing
- Pronunciation
- History, User
| Coverage | Categories | Methods |
|---|---|---|
| Full ✓ | TTS, STT, S2S, Voices, Models, History, User, SFX, Alignment, Isolation, Dialogue, Music, Pronunciation | ~55 |
| Partial ✓ | Voice Design, Projects, Dubbing, Phone/Twilio | ~20 |
| Not Covered ✗ | PVC, ConvAI, Knowledge Base, Workspace, MCP | ~129 |
- Core audio features: Fully covered (TTS, STT, Music, S2S)
- Real-time streaming: WebSocket TTS + STT for voice agents
- Phone integration: Twilio calls + phone number management
- Enterprise features: Not yet covered (Conversational AI agents)
Documentation: Full coverage page with method-level details
// Simple usage
audio, err := client.TextToSpeech().Simple(ctx, voiceID, "Hello world!")
// Full control
resp, err := client.TextToSpeech().Generate(ctx, &elevenlabs.TTSRequest{
VoiceID: "21m00Tcm4TlvDq8ikWAM",
Text: "Hello with custom settings!",
ModelID: "eleven_multilingual_v2",
VoiceSettings: &elevenlabs.VoiceSettings{
Stability: 0.6,
SimilarityBoost: 0.8,
Style: 0.1,
SpeakerBoost: true,
},
OutputFormat: "mp3_44100_192",
})
// Streaming for real-time playback
stream, err := client.TextToSpeech().GenerateStream(ctx, request)// Generate multi-speaker conversation
audio, err := client.TextToDialogue().Simple(ctx, []elevenlabs.DialogueInput{
{Text: "Welcome to the show!", VoiceID: hostVoice},
{Text: "Thanks for having me.", VoiceID: guestVoice},
{Text: "Let's dive into today's topic.", VoiceID: hostVoice},
})
// With timestamps for video sync
resp, err := client.TextToDialogue().GenerateWithTimestamps(ctx, &elevenlabs.DialogueRequest{
Inputs: dialogueInputs,
})
for _, seg := range resp.VoiceSegments {
fmt.Printf("Speaker %s: %.2fs - %.2fs\n", seg.VoiceID, seg.StartTime, seg.EndTime)
}Use cases: Podcasts, audiobooks, educational content, demos
| Package | Test Files | Key Tests |
|---|---|---|
| Core SDK | 10 files | Client, TTS, Voices, Models, History |
| New Services | 6 files | STT, Alignment, Isolation, Dialogue, VoiceDesign, Music |
| Utilities | 1 file | Pronunciation rules, PLS export |
- Validation Tests: Required fields, value ranges
- Service Tests: Service accessibility and initialization
- Response Tests: Struct initialization and field access
$ go test ./...
ok github.com/agentplexus/go-elevenlabs 0.270s
$ golangci-lint run
0 issues- Getting Started: Installation, configuration, quick start
- Services (15 pages): All implemented services with examples
- Guides: LMS courses, pronunciation rules, TTS script authoring
- Utilities:
voices,ttsscript,retryhttpdocs - API Reference: Client, errors, coverage page
voices/: Pre-made voice constants and metadatattsscript/: Multilingual script authoring- mogo
retryhttp: HTTP retry with exponential backoff
- All 204 API methods categorized
- Method-level coverage status with ✓/✗
- SDK method mapping
Service Docs Include:
- Basic usage examples
- Full options with all parameters
- Response structures
- Multiple use case examples
- Best practices
script, _ := ttsscript.LoadScript("course.json")
compiler := ttsscript.NewCompiler()
segments, _ := compiler.Compile(script, "en")
jobs := formatter.Format(segments)// Use constants instead of IDs
audio, _ := client.TextToSpeech().Simple(
ctx, voices.Rachel, text)import "github.com/grokify/mogo/net/http/retryhttp"
rt := retryhttp.NewWithOptions(
retryhttp.WithMaxRetries(3),
retryhttp.WithInitialBackoff(1*time.Second),
retryhttp.WithLogger(slog.Default()),
)
client, _ := elevenlabs.NewClient(
elevenlabs.WithHTTPClient(rt.Client()),
)
// Auto-retry on 429, 5xx + injectable loggingClaude Opus 4.5 performance, insights & lessons learned
| Setting | Value |
|---|---|
| Model | Claude Opus 4.5 (claude-opus-4-5-20251101) |
| Context | Extended (with summarization) |
| Tools | Full Claude Code toolset |
- Iterative implementation with immediate testing
- Parallel file reads and writes for efficiency
- Todo tracking for complex multi-step tasks
- Continuous golangci-lint validation
-
ogen Type Handling
- OptString, OptNilString
- OptInt, OptNilInt
- OptFloat64, OptNilFloat64
- Complex oneOf response types
-
Wrapper Service Design
- Clean interface over generated code
- Simplified method signatures
- Consistent validation patterns
-
Documentation Generation
- 15 service documentation pages
- Comprehensive code examples
- Best practices sections
- API coverage analysis
-
Test Coverage
- Validation tests
- Service accessibility tests
- Response struct tests
-
- Issue: Various
OptXxxandOptNilXxxtypes - Solution: Careful use of
NewOptString()vsNewOptNilString()
- Issue: Various
-
- Issue: API returns different response types
- Solution: Type switches to handle variants
switch r := resp.(type) { case *api.TextToSpeechOK: return r.Data, nil default: return nil, &APIError{Message: "unexpected response"} }
-
- Issue: 330K lines of generated code
- Solution: Targeted grep searches for method signatures
- Wrapper services provide clean interfaces over generated code
- Document coverage explicitly - helps users understand what's available
- Test validation thoroughly - required fields, value ranges, error messages
- Write docs alongside code - service docs created with implementation
- Use todo tracking - essential for multi-file parallel tasks
A production-ready Go SDK with 15 services, comprehensive documentation, and full test coverage
Deliverables, future work & resources
| Deliverable | Status |
|---|---|
| 19 Service Wrappers | ✅ Complete |
| Real-Time Services | ✅ WebSocket TTS/STT, Twilio |
| ogen API Client | ✅ Complete (204 methods) |
| Test Suite | ✅ Complete (19 test files) |
| MkDocs Documentation | ✅ Complete (32 pages) |
| API Coverage Page | ✅ Complete |
Repository: github.com/agentplexus/go-elevenlabs
- Conversational AI Agents: Full agent management and conversations
- Professional Voice Cloning: Train custom voices with samples
- Voice Library: Discover and share community voices
- Knowledge Base / RAG: Document management for agent context
- Workspace Management: Enterprise team features
- Open for contributions
- Issues and PRs welcome
- MIT License
- Repository: github.com/agentplexus/go-elevenlabs
- Documentation: agentplexus.github.io/go-elevenlabs
- ElevenLabs: elevenlabs.io/docs
- Go Package: pkg.go.dev/github.com/agentplexus/go-elevenlabs
- GitHub: @agentplexus
A Go SDK for AI Audio Generation
Built with Claude Opus 4.5 + Claude Code