marp	true
theme	vibeminds
paginate	true
style	/* Mermaid diagram styling / .mermaid-container { display: flex; justify-content: center; align-items: center; width: 100%; margin: 0.5em 0; } .mermaid { text-align: center; } .mermaid svg { max-height: 280px; width: auto; } .mermaid .node rect, .mermaid .node polygon { rx: 5px; ry: 5px; } .mermaid .nodeLabel { padding: 0 10px; } / Two-column layout / .columns { display: flex; gap: 40px; align-items: flex-start; } .column-left { flex: 1; } .column-right { flex: 1; } .column-left .mermaid svg { min-height: 400px; height: auto; max-height: 500px; } / Section divider slides */ section.section-divider { display: flex; flex-direction: column; justify-content: center; align-items: center; text-align: center; background: linear-gradient(135deg, #1a1a3e 0%, #4a3f8a 50%, #2d2d5a 100%); } section.section-divider h1 { font-size: 3.5em; margin-bottom: 0.2em; } section.section-divider h2 { font-size: 1.5em; color: #b39ddb; font-weight: 400; } section.section-divider p { font-size: 1.1em; color: #9575cd; margin-top: 1em; }

Building go-elevenlabs

A Go SDK for AI Audio Generation

An AI-Assisted Development Case Study

Using Claude Opus 4.5 with Claude Code

Section 1

Introduction & Overview

What is ElevenLabs and how we approached the SDK

What is ElevenLabs? 🎙️

ElevenLabs is an AI audio platform for realistic audio generation

Text-to-Speech - Convert text to realistic speech with multiple voices
Speech-to-Text - Transcribe audio with speaker diarization
Speech-to-Speech - Voice conversion in real-time
Sound Effects - Generate sound effects from text descriptions
Music Composition - Generate music from text prompts
Voice Design - Create custom AI voices with specific characteristics
Real-Time APIs - WebSocket streaming + Twilio phone integration

Goal: Build a comprehensive Go SDK for AI audio and voice agents

Project Scope 📋

Category	Services
Core Audio	Text-to-Speech, Speech-to-Text, Sound Effects, Music
Voice	Voices, Voice Design, Models, Speech-to-Speech
Processing	Audio Isolation, Forced Alignment, Text-to-Dialogue
Content	Projects, Pronunciation, Dubbing
Real-Time	WebSocket TTS, WebSocket STT, Twilio, Phone Numbers
Utility	History, User

OpenAPI Spec: 204 operations (~54K lines) | Generated Code: ~330K lines

Output: 44+ Go source files (~8K lines handwritten) + 19 test files

Architecture Overview 🏗️

go-elevenlabs/
├── client.go              # Main client with service accessors
├── texttospeech.go        # Text-to-Speech service wrapper
├── speechtotext.go        # Speech-to-Text + real-time STT
├── speechtospeech.go      # Voice conversion service
├── websockettts.go        # Real-time TTS streaming
├── websocketstt.go        # Real-time STT streaming
├── twilio.go              # Twilio + phone integration
├── music.go               # Music composition + stem separation
├── ttsscript/             # TTS script authoring package
├── voices/                # Voice reference package
├── internal/api/          # ogen-generated API client (~330K lines)
└── docs/                # MkDocs documentation site (32 pages)

Key Design Decisions 🎯

1. ogen for API Client Generation

Type-safe, no reflection
Handles optional/nullable fields correctly
Generated from OpenAPI spec (54K lines)

2. Wrapper Services Pattern

Clean, idiomatic Go interface
Hides ogen complexity from users
Provides simplified method signatures

3. Functional Options Pattern

client, err := elevenlabs.NewClient(
    elevenlabs.WithAPIKey("your-api-key"),
    elevenlabs.WithTimeout(5 * time.Minute),
)

Section 2

Implementation Deep Dive

Features, API Coverage, Testing & Documentation

19 Services Implemented ✨

Audio Generation

Text-to-Speech
Sound Effects
Music

Transcription

Speech-to-Text
Forced Alignment

Voice

Voices
Voice Design
Models
Speech-to-Speech

Processing

Audio Isolation
Text-to-Dialogue

Real-Time

WebSocket TTS ⚡
WebSocket STT ⚡
Twilio Integration
Phone Numbers

Content

Projects, Dubbing
Pronunciation
History, User

API Coverage 📊

Coverage	Categories	Methods
Full ✓	TTS, STT, S2S, Voices, Models, History, User, SFX, Alignment, Isolation, Dialogue, Music, Pronunciation	~55
Partial ✓	Voice Design, Projects, Dubbing, Phone/Twilio	~20
Not Covered ✗	PVC, ConvAI, Knowledge Base, Workspace, MCP	~129

Coverage Highlights

Core audio features: Fully covered (TTS, STT, Music, S2S)
Real-time streaming: WebSocket TTS + STT for voice agents
Phone integration: Twilio calls + phone number management
Enterprise features: Not yet covered (Conversational AI agents)

Documentation: Full coverage page with method-level details

Example: Text-to-Speech 💻

// Simple usage
audio, err := client.TextToSpeech().Simple(ctx, voiceID, "Hello world!")

// Full control
resp, err := client.TextToSpeech().Generate(ctx, &elevenlabs.TTSRequest{
    VoiceID: "21m00Tcm4TlvDq8ikWAM",
    Text:    "Hello with custom settings!",
    ModelID: "eleven_multilingual_v2",
    VoiceSettings: &elevenlabs.VoiceSettings{
        Stability:       0.6,
        SimilarityBoost: 0.8,
        Style:           0.1,
        SpeakerBoost:    true,
    },
    OutputFormat: "mp3_44100_192",
})

// Streaming for real-time playback
stream, err := client.TextToSpeech().GenerateStream(ctx, request)

Example: Text-to-Dialogue 🎭

// Generate multi-speaker conversation
audio, err := client.TextToDialogue().Simple(ctx, []elevenlabs.DialogueInput{
    {Text: "Welcome to the show!", VoiceID: hostVoice},
    {Text: "Thanks for having me.", VoiceID: guestVoice},
    {Text: "Let's dive into today's topic.", VoiceID: hostVoice},
})

// With timestamps for video sync
resp, err := client.TextToDialogue().GenerateWithTimestamps(ctx, &elevenlabs.DialogueRequest{
    Inputs: dialogueInputs,
})

for _, seg := range resp.VoiceSegments {
    fmt.Printf("Speaker %s: %.2fs - %.2fs\n", seg.VoiceID, seg.StartTime, seg.EndTime)
}

Use cases: Podcasts, audiobooks, educational content, demos

Testing Strategy 🧪

Test Coverage

Package	Test Files	Key Tests
Core SDK	10 files	Client, TTS, Voices, Models, History
New Services	6 files	STT, Alignment, Isolation, Dialogue, VoiceDesign, Music
Utilities	1 file	Pronunciation rules, PLS export

Test Types

Validation Tests: Required fields, value ranges
Service Tests: Service accessibility and initialization
Response Tests: Struct initialization and field access

$ go test ./...
ok  github.com/agentplexus/go-elevenlabs    0.270s

$ golangci-lint run
0 issues

Documentation Created 📚

MkDocs Site Structure (28 pages)

Getting Started: Installation, configuration, quick start
Services (15 pages): All implemented services with examples
Guides: LMS courses, pronunciation rules, TTS script authoring
Utilities: voices, ttsscript, retryhttp docs
API Reference: Client, errors, coverage page

Utility Packages

voices/: Pre-made voice constants and metadata
ttsscript/: Multilingual script authoring
mogo retryhttp: HTTP retry with exponential backoff

Coverage Page

All 204 API methods categorized
Method-level coverage status with ✓/✗
SDK method mapping

Documentation Flow 📖

flowchart LR A["📚 Docs Home"] --> B["🚀 Getting Started"] A --> C["⚙️ Services (15)"] A --> D["📋 API Reference"] A --> E["📖 Guides"] A --> F["💡 Examples"] D --> G["✓/✗ Coverage"] style A fill:#667eea,stroke:#764ba2,color:#fff style B fill:#667eea,stroke:#764ba2,color:#fff style C fill:#667eea,stroke:#764ba2,color:#fff style D fill:#667eea,stroke:#764ba2,color:#fff style E fill:#667eea,stroke:#764ba2,color:#fff style F fill:#667eea,stroke:#764ba2,color:#fff style G fill:#764ba2,stroke:#667eea,color:#fff

Service Docs Include:

Basic usage examples
Full options with all parameters
Response structures
Multiple use case examples
Best practices

Utility Packages 📦

ttsscript - Script Authoring

script, _ := ttsscript.LoadScript("course.json")
compiler := ttsscript.NewCompiler()
segments, _ := compiler.Compile(script, "en")
jobs := formatter.Format(segments)

voices - Voice Reference

// Use constants instead of IDs
audio, _ := client.TextToSpeech().Simple(
    ctx, voices.Rachel, text)

retryhttp - Retry Transport

import "github.com/grokify/mogo/net/http/retryhttp"

rt := retryhttp.NewWithOptions(
    retryhttp.WithMaxRetries(3),
    retryhttp.WithInitialBackoff(1*time.Second),
    retryhttp.WithLogger(slog.Default()),
)
client, _ := elevenlabs.NewClient(
    elevenlabs.WithHTTPClient(rt.Client()),
)
// Auto-retry on 429, 5xx + injectable logging

Section 3

AI-Assisted Development

Claude Opus 4.5 performance, insights & lessons learned

Claude Opus 4.5 DevEx 🧠

Session Configuration

Setting	Value
Model	Claude Opus 4.5 (`claude-opus-4-5-20251101`)
Context	Extended (with summarization)
Tools	Full Claude Code toolset

Development Approach

Iterative implementation with immediate testing
Parallel file reads and writes for efficiency
Todo tracking for complex multi-step tasks
Continuous golangci-lint validation

Session Statistics 📊

Source Analysis

Category	Count
OpenAPI Spec	54K lines
Generated Code	330K lines
API Methods	204

Output Created

Category	Count
Go Source Files	44+
Handwritten Code	~8K lines
Test Files	19
Doc Pages	32
Services	19
Utility Packages	2 (+mogo)

What Claude Opus 4.5 Handled Well 💪

ogen Type Handling
- OptString, OptNilString
- OptInt, OptNilInt
- OptFloat64, OptNilFloat64
- Complex oneOf response types
Wrapper Service Design
- Clean interface over generated code
- Simplified method signatures
- Consistent validation patterns

Documentation Generation
- 15 service documentation pages
- Comprehensive code examples
- Best practices sections
- API coverage analysis
Test Coverage
- Validation tests
- Service accessibility tests
- Response struct tests

Challenges & Solutions 🔧

Challenge 1: ogen Optional Types
- Issue: Various OptXxx and OptNilXxx types
- Solution: Careful use of NewOptString() vs NewOptNilString()

Challenge 2: oneOf Response Types

Issue: API returns different response types
Solution: Type switches to handle variants

switch r := resp.(type) {
case *api.TextToSpeechOK:
    return r.Data, nil
default:
    return nil, &APIError{Message: "unexpected response"}
}

Challenge 3: Large Generated Codebase
- Issue: 330K lines of generated code
- Solution: Targeted grep searches for method signatures

Key Takeaways 💡

AI-Assisted SDK Development Insights

Wrapper services provide clean interfaces over generated code
Document coverage explicitly - helps users understand what's available
Test validation thoroughly - required fields, value ranges, error messages
Write docs alongside code - service docs created with implementation
Use todo tracking - essential for multi-file parallel tasks

Result

A production-ready Go SDK with 15 services, comprehensive documentation, and full test coverage

Section 4

Conclusion

Deliverables, future work & resources

Project Deliverables 📦

Deliverable	Status
19 Service Wrappers	✅ Complete
Real-Time Services	✅ WebSocket TTS/STT, Twilio
ogen API Client	✅ Complete (204 methods)
Test Suite	✅ Complete (19 test files)
MkDocs Documentation	✅ Complete (32 pages)
API Coverage Page	✅ Complete

Repository: github.com/agentplexus/go-elevenlabs

Future Enhancements 🔮

Priority APIs to Add

Conversational AI Agents: Full agent management and conversations
Professional Voice Cloning: Train custom voices with samples
Voice Library: Discover and share community voices
Knowledge Base / RAG: Document management for agent context
Workspace Management: Enterprise team features

Community

Open for contributions
Issues and PRs welcome
MIT License

Resources 🔗

Contact

GitHub: @agentplexus

Thank You 🙏

go-elevenlabs

A Go SDK for AI Audio Generation

Built with Claude Opus 4.5 + Claude Code

FilesExpand file tree

PRESENTATION.md

Latest commit

History