Using Commercial APIs with Audify

Audify now supports commercial LLM APIs in addition to local Ollama models. This allows you to use powerful cloud-based models like DeepSeek, Claude, GPT-4, and Gemini for generating audiobook scripts.

Supported APIs

DeepSeek - Fast and cost-effective API
Anthropic Claude - High-quality reasoning and long context
OpenAI GPT - Industry-standard models
Google Gemini - Google's latest AI models

Setup

1. Create a `.keys` File

Create a .keys file in the project root directory with your API keys:

cp .keys.example .keys

Edit the .keys file and add your API keys:

DEEPSEEK=sk-your-deepseek-api-key-here
ANTHROPIC=sk-ant-your-anthropic-api-key-here
OPENAI=sk-your-openai-api-key-here
GEMINI=your-google-api-key-here
# Or use GOOGLE (both names are equivalent):
# GOOGLE=your-google-api-key-here

Note: For Google Gemini, you can use either GEMINI or GOOGLE as the key name in your .keys file. Both are supported and equivalent. Similarly, when setting environment variables, both GEMINI_API_KEY and GOOGLE_API_KEY will work.

Important Security Notes:

The .keys file is already in .gitignore to prevent accidental commits
Never commit API keys to version control
Keep your API keys secure and private
You can also set API keys as environment variables (e.g., DEEPSEEK_API_KEY)

2. Get API Keys

DeepSeek

Visit https://platform.deepseek.com/
Sign up for an account
Generate an API key from your dashboard
DeepSeek offers competitive pricing and good performance

Anthropic Claude

Visit https://console.anthropic.com/
Create an account
Generate an API key
Claude excels at reasoning and has large context windows

OpenAI

Visit https://platform.openai.com/
Sign up and add billing information
Generate an API key from the API section
Models include GPT-4, GPT-4-turbo, GPT-3.5-turbo

Google Gemini

Visit https://ai.google.dev/
Get started with Gemini API
Create an API key
Gemini offers various models including Gemini Pro

Usage

To use a commercial API, prefix the model name with api: when using the -m or --llm-model option:

DeepSeek Examples

# Using DeepSeek Chat
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-chat"

# Using DeepSeek R1 (reasoning model)
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-reasoner"

Claude Examples

# Using Claude 3 Sonnet
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-sonnet-20240229"

# Using Claude 3.5 Sonnet (latest)
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-5-sonnet-20240620"

# Using Claude 3 Opus (most capable)
python -m audify.create_audiobook mybook.epub -m "api:anthropic/claude-3-opus-20240229"

OpenAI Examples

# Using GPT-4
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-4"

# Using GPT-4 Turbo
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-4-turbo-preview"

# Using GPT-3.5 Turbo (faster, cheaper)
python -m audify.create_audiobook mybook.epub -m "api:openai/gpt-3.5-turbo"

Google Gemini Examples

# Using Gemini Pro
python -m audify.create_audiobook mybook.epub -m "api:gemini/gemini-pro"

# Using Gemini 1.5 Pro
python -m audify.create_audiobook mybook.epub -m "api:gemini/gemini-1.5-pro"

Complete Example

# Create audiobook using DeepSeek with Spanish translation
python -m audify.create_audiobook mybook.epub \
  -m "api:deepseek/deepseek-chat" \
  -l en \
  -t es \
  -v af_bella \
  --save-scripts

# Create audiobook from PDF using Claude
python -m audify.create_audiobook document.pdf \
  -m "api:anthropic/claude-3-sonnet-20240229" \
  -l en \
  -v af_sarah

# Process directory of books using GPT-4
python -m audify.create_audiobook ./books/ \
  -m "api:openai/gpt-4-turbo-preview" \
  -l en \
  -o ./output

Comparing Ollama vs Commercial APIs

Feature	Ollama (Local)	Commercial APIs
Cost	Free	Pay per token
Privacy	Complete privacy	Data sent to cloud
Speed	Depends on hardware	Generally faster
Quality	Varies by model	State-of-the-art
Internet	Not required	Required
Setup	Install Ollama + models	Just API key

Model Recommendations

For Best Quality

Claude 3 Opus: Best reasoning and quality
GPT-4: Reliable and well-tested
Claude 3.5 Sonnet: Great balance of quality and speed

For Best Value

DeepSeek Chat: Very cost-effective
GPT-3.5 Turbo: Fast and affordable
Gemini Pro: Good balance

For Reasoning Tasks

DeepSeek R1: Specialized reasoning model
Claude 3 Opus: Excellent reasoning
GPT-4: Strong reasoning capabilities

Troubleshooting

API Key Not Found

Error: API key issue. Please ensure your API key is properly configured

Solution: Check that:

The .keys file exists in the project root
The API key is properly formatted (e.g., DEEPSEEK=sk-...)
No extra spaces around the = sign
The key name matches the service (DEEPSEEK, ANTHROPIC, OPENAI, GEMINI)

Connection Error

Error: Could not connect to commercial API

Solution:

Check your internet connection
Verify the API key is valid
Check if the API service is experiencing downtime
Ensure you have API credits/billing enabled

Invalid Model Name

Error: Failed to generate audiobook script

Solution:

Verify the model name is correct (check API documentation)
Ensure you're using the api: prefix
Some models may require specific API access tiers

Environment Variables (Alternative to .keys file)

Instead of using a .keys file, you can set environment variables:

# Bash/Zsh
export DEEPSEEK_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export OPENAI_API_KEY="sk-your-key"
# For Google Gemini, use either GOOGLE_API_KEY or GEMINI_API_KEY (both work):
export GOOGLE_API_KEY="your-key"
# export GEMINI_API_KEY="your-key"  # Alternative, equivalent to above

# Then run normally
python -m audify.create_audiobook mybook.epub -m "api:deepseek/deepseek-chat"

Cost Considerations

Commercial APIs charge based on token usage (input + output). Audiobook generation can use significant tokens due to:

Processing entire chapters
Generating narrative scripts
Multiple chapters per book

Tips to minimize costs:

Use more cost-effective models like DeepSeek or GPT-3.5
Limit --max-chapters for testing
Monitor your API usage on the provider's dashboard
Consider using Ollama for testing, then commercial APIs for final output

Technical Details

The implementation uses LiteLLM to provide a unified interface across different API providers. This means:

Consistent API regardless of provider
Easy to add new providers
Automatic retry and error handling
Unified token counting

Model names are passed directly to LiteLLM, so any model supported by LiteLLM can be used with the api: prefix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Commercial APIs with Audify

Supported APIs

Setup

1. Create a `.keys` File

2. Get API Keys

DeepSeek

Anthropic Claude

OpenAI

Google Gemini

Usage

DeepSeek Examples

Claude Examples

OpenAI Examples

Google Gemini Examples

Complete Example

Comparing Ollama vs Commercial APIs

Model Recommendations

For Best Quality

For Best Value

For Reasoning Tasks

Troubleshooting

API Key Not Found

Connection Error

Invalid Model Name

Environment Variables (Alternative to .keys file)

Cost Considerations

Technical Details

FilesExpand file tree

COMMERCIAL_APIS.md

Latest commit

History

COMMERCIAL_APIS.md

File metadata and controls

Using Commercial APIs with Audify

Supported APIs

Setup

1. Create a .keys File

2. Get API Keys

DeepSeek

Anthropic Claude

OpenAI

Google Gemini

Usage

DeepSeek Examples

Claude Examples

OpenAI Examples

Google Gemini Examples

Complete Example

Comparing Ollama vs Commercial APIs

Model Recommendations

For Best Quality

For Best Value

For Reasoning Tasks

Troubleshooting

API Key Not Found

Connection Error

Invalid Model Name

Environment Variables (Alternative to .keys file)

Cost Considerations

Technical Details

1. Create a `.keys` File