Audify now supports commercial LLM APIs in addition to local Ollama models. This allows you to use powerful cloud-based models like DeepSeek, Claude, GPT-4, and Gemini for generating audiobook scripts.
- DeepSeek - Fast and cost-effective API
- Anthropic Claude - High-quality reasoning and long context
- OpenAI GPT - Industry-standard models
- Google Gemini - Google's latest AI models
Create a .keys file in the project root directory with your API keys:
cp .keys.example .keysEdit the .keys file and add your API keys:
DEEPSEEK=sk-your-deepseek-api-key-here
ANTHROPIC=sk-ant-your-anthropic-api-key-here
OPENAI=sk-your-openai-api-key-here
GEMINI=your-google-api-key-here
# Or use GOOGLE (both names are equivalent):
# GOOGLE=your-google-api-key-here
Note: For Google Gemini, you can use either GEMINI or GOOGLE as the key name in your .keys file. Both are supported and equivalent. Similarly, when setting environment variables, both GEMINI_API_KEY and GOOGLE_API_KEY will work.
Important Security Notes:
- The
.keysfile is already in.gitignoreto prevent accidental commits - Never commit API keys to version control
- Keep your API keys secure and private
- You can also set API keys as environment variables (e.g.,
DEEPSEEK_API_KEY)
- Visit https://platform.deepseek.com/
- Sign up for an account
- Generate an API key from your dashboard
- DeepSeek offers competitive pricing and good performance
- Visit https://console.anthropic.com/
- Create an account
- Generate an API key
- Claude excels at reasoning and has large context windows
- Visit https://platform.openai.com/
- Sign up and add billing information
- Generate an API key from the API section
- Models include GPT-4, GPT-4-turbo, GPT-3.5-turbo
- Visit https://ai.google.dev/
- Get started with Gemini API
- Create an API key
- Gemini offers various models including Gemini Pro
To use a commercial API, prefix the model name with api: when using the -m or --llm-model option:
# Using DeepSeek Chat
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-chat"
# Using DeepSeek R1 (reasoning model)
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-reasoner"# Using Claude 3 Sonnet
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-sonnet-20240229"
# Using Claude 3.5 Sonnet (latest)
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-5-sonnet-20240620"
# Using Claude 3 Opus (most capable)
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-opus-20240229"# Using GPT-4
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-4"
# Using GPT-4 Turbo
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-4-turbo-preview"
# Using GPT-3.5 Turbo (faster, cheaper)
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-3.5-turbo"# Using Gemini Pro
python -m audify.create_audiobook mybook.epub -m "api:gemini/gemini-pro"
# Using Gemini 1.5 Pro
python -m audify.create_audiobook mybook.epub -m "api:gemini/gemini-1.5-pro"# Create audiobook using DeepSeek with Spanish translation
python -m audify.create_audiobook mybook.epub \
-m "api:deepseek/deepseek-chat" \
-l en \
-t es \
-v af_bella \
--save-scripts
# Create audiobook from PDF using Claude
python -m audify.create_audiobook document.pdf \
-m "api:anthropic/claude-3-sonnet-20240229" \
-l en \
-v af_sarah
# Process directory of books using GPT-4
python -m audify.create_audiobook ./books/ \
-m "api:openai/gpt-4-turbo-preview" \
-l en \
-o ./output| Feature | Ollama (Local) | Commercial APIs |
|---|---|---|
| Cost | Free | Pay per token |
| Privacy | Complete privacy | Data sent to cloud |
| Speed | Depends on hardware | Generally faster |
| Quality | Varies by model | State-of-the-art |
| Internet | Not required | Required |
| Setup | Install Ollama + models | Just API key |
- Claude 3 Opus: Best reasoning and quality
- GPT-4: Reliable and well-tested
- Claude 3.5 Sonnet: Great balance of quality and speed
- DeepSeek Chat: Very cost-effective
- GPT-3.5 Turbo: Fast and affordable
- Gemini Pro: Good balance
- DeepSeek R1: Specialized reasoning model
- Claude 3 Opus: Excellent reasoning
- GPT-4: Strong reasoning capabilities
Error: API key issue. Please ensure your API key is properly configured
Solution: Check that:
- The
.keysfile exists in the project root - The API key is properly formatted (e.g.,
DEEPSEEK=sk-...) - No extra spaces around the
=sign - The key name matches the service (DEEPSEEK, ANTHROPIC, OPENAI, GEMINI)
Error: Could not connect to commercial API
Solution:
- Check your internet connection
- Verify the API key is valid
- Check if the API service is experiencing downtime
- Ensure you have API credits/billing enabled
Error: Failed to generate audiobook script
Solution:
- Verify the model name is correct (check API documentation)
- Ensure you're using the
api:prefix - Some models may require specific API access tiers
Instead of using a .keys file, you can set environment variables:
# Bash/Zsh
export DEEPSEEK_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export OPENAI_API_KEY="sk-your-key"
# For Google Gemini, use either GOOGLE_API_KEY or GEMINI_API_KEY (both work):
export GOOGLE_API_KEY="your-key"
# export GEMINI_API_KEY="your-key" # Alternative, equivalent to above
# Then run normally
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-chat"Commercial APIs charge based on token usage (input + output). Audiobook generation can use significant tokens due to:
- Processing entire chapters
- Generating narrative scripts
- Multiple chapters per book
Tips to minimize costs:
- Use more cost-effective models like DeepSeek or GPT-3.5
- Limit
--max-chaptersfor testing - Monitor your API usage on the provider's dashboard
- Consider using Ollama for testing, then commercial APIs for final output
The implementation uses LiteLLM to provide a unified interface across different API providers. This means:
- Consistent API regardless of provider
- Easy to add new providers
- Automatic retry and error handling
- Unified token counting
Model names are passed directly to LiteLLM, so any model supported by LiteLLM can be used with the api: prefix.