|
6 | 6 | [](https://codecov.io/gh/fishaudio/fish-audio-python) |
7 | 7 | [](https://github.com/fishaudio/fish-audio-python/blob/main/LICENSE) |
8 | 8 |
|
9 | | -The official Python library for the Fish Audio API. |
| 9 | +The official Python library for the Fish Audio API |
10 | 10 |
|
11 | | -## Notice: New API Available |
| 11 | +**Documentation:** [Python SDK Guide](https://docs.fish.audio/developer-guide/sdk-guide/python/) | [API Reference](https://docs.fish.audio/api-reference/sdk/python/) |
12 | 12 |
|
13 | | -The SDK now includes a modern `fishaudio` API with improved ergonomics, better type safety, and enhanced features. |
14 | | - |
15 | | -For new projects, use the `fishaudio` module. For existing projects using the legacy API, see the [Legacy SDK section](#legacy-sdk) below |
16 | | - |
17 | | -## API Documentation |
18 | | - |
19 | | -For complete documentation and API reference, visit the [Python SDK Guide](https://docs.fish.audio/developer-guide/sdk-guide/python/) and [API Reference](https://docs.fish.audio/api-reference/sdk/python/). |
| 13 | +> **Note:** If you're using the legacy `fish_audio_sdk` API, see the [migration guide](https://docs.fish.audio/archive/python-sdk-legacy/migration-guide) to upgrade. |
20 | 14 |
|
21 | 15 | ## Installation |
22 | 16 |
|
23 | | -This package is available on PyPI: |
24 | | - |
25 | 17 | ```bash |
26 | 18 | pip install fish-audio-sdk |
| 19 | + |
| 20 | +# With audio playback utilities |
| 21 | +pip install fish-audio-sdk[utils] |
27 | 22 | ``` |
28 | 23 |
|
29 | | -You may install from source by running the following command in the repository root: |
| 24 | +## Authentication |
| 25 | + |
| 26 | +Get your API key from [fish.audio/app/api-keys](https://fish.audio/app/api-keys): |
30 | 27 |
|
31 | 28 | ```bash |
32 | | -python -m pip install . |
| 29 | +export FISH_API_KEY=your_api_key_here |
33 | 30 | ``` |
34 | 31 |
|
35 | | -## Usage |
36 | | - |
37 | | -The client will need to be configured with an API key, which you can obtain from [Fish Audio](https://fish.audio/app/api-keys). |
| 32 | +Or provide directly: |
38 | 33 |
|
39 | 34 | ```python |
40 | 35 | from fishaudio import FishAudio |
41 | 36 |
|
42 | | -client = FishAudio() # Automatically reads from the FISH_API_KEY environment variable |
43 | | - |
44 | | -client = FishAudio(api_key="your-api-key") # Or provide the API key directly |
| 37 | +client = FishAudio(api_key="your_api_key") |
45 | 38 | ``` |
46 | 39 |
|
47 | | -The SDK provides [text-to-speech](#text-to-speech), [voice cloning](#instant-voice-cloning), [speech recognition](#speech-recognition-asr), and [voice management](#voice-management) capabilities. |
48 | | - |
49 | | -### Text-to-Speech |
50 | | - |
51 | | -Convert text to natural-sounding speech with support for multiple voices, formats, and real-time streaming. |
| 40 | +## Quick Start |
52 | 41 |
|
53 | | -#### Basic |
| 42 | +**Synchronous:** |
54 | 43 |
|
55 | 44 | ```python |
56 | 45 | from fishaudio import FishAudio |
57 | | -from fishaudio.utils import save, play |
| 46 | +from fishaudio.utils import play, save |
58 | 47 |
|
59 | 48 | client = FishAudio() |
60 | 49 |
|
61 | | -audio = client.tts.convert(text="Hello, world!") # Default voice and settings |
62 | | -play(audio) # Play audio directly |
| 50 | +# Generate audio |
| 51 | +audio = client.tts.convert(text="Hello, world!") |
63 | 52 |
|
64 | | -audio = client.tts.convert(text="Welcome to Fish Audio SDK!") |
65 | | -save(audio, "output.mp3") # You can also save to a file |
| 53 | +# Play or save |
| 54 | +play(audio) |
| 55 | +save(audio, "output.mp3") |
| 56 | +``` |
| 57 | + |
| 58 | +**Asynchronous:** |
| 59 | + |
| 60 | +```python |
| 61 | +import asyncio |
| 62 | +from fishaudio import AsyncFishAudio |
| 63 | +from fishaudio.utils import play, save |
| 64 | + |
| 65 | +async def main(): |
| 66 | + client = AsyncFishAudio() |
| 67 | + audio = await client.tts.convert(text="Hello, world!") |
| 68 | + play(audio) |
| 69 | + save(audio, "output.mp3") |
| 70 | + |
| 71 | +asyncio.run(main()) |
66 | 72 | ``` |
67 | 73 |
|
68 | | -#### With Reference Voice |
| 74 | +## Core Features |
69 | 75 |
|
70 | | -Use a reference voice ID to ensure consistent voice characteristics across generations: |
| 76 | +### Text-to-Speech |
| 77 | + |
| 78 | +**With custom voice:** |
71 | 79 |
|
72 | 80 | ```python |
73 | | -# Use an existing voice by ID |
| 81 | +# Use a specific voice by ID |
74 | 82 | audio = client.tts.convert( |
75 | | - text="This will sound like the reference voice!", |
76 | | - reference_id="802e3bc2b27e49c2995d23ef70e6ac89" # Energetic Male |
| 83 | + text="Custom voice", |
| 84 | + reference_id="802e3bc2b27e49c2995d23ef70e6ac89" |
77 | 85 | ) |
78 | 86 | ``` |
79 | 87 |
|
80 | | -#### Instant Voice Cloning |
81 | | - |
82 | | -Immediately clone a voice from a short audio sample: |
| 88 | +**With speed control:** |
83 | 89 |
|
84 | 90 | ```python |
85 | | -# Clone a voice from audio sample |
86 | | -with open("reference.wav", "rb") as f: |
87 | | - audio = client.tts.convert( |
88 | | - text="This will sound like the reference voice!", |
89 | | - reference_audio=f.read(), |
90 | | - reference_text="Transcription of the reference audio" |
91 | | - ) |
| 91 | +audio = client.tts.convert( |
| 92 | + text="Speaking faster!", |
| 93 | + speed=1.5 # 1.5x speed |
| 94 | +) |
92 | 95 | ``` |
93 | 96 |
|
94 | | -#### Streaming Audio Chunks |
| 97 | +**Reusable configuration:** |
| 98 | + |
| 99 | +```python |
| 100 | +from fishaudio.types import TTSConfig, Prosody |
95 | 101 |
|
96 | | -For processing audio chunks as they're generated: |
| 102 | +config = TTSConfig( |
| 103 | + prosody=Prosody(speed=1.2, volume=-5), |
| 104 | + reference_id="933563129e564b19a115bedd57b7406a", |
| 105 | + format="wav", |
| 106 | + latency="balanced" |
| 107 | +) |
| 108 | + |
| 109 | +# Reuse across generations |
| 110 | +audio1 = client.tts.convert(text="First message", config=config) |
| 111 | +audio2 = client.tts.convert(text="Second message", config=config) |
| 112 | +``` |
| 113 | + |
| 114 | +**Chunk-by-chunk processing:** |
97 | 115 |
|
98 | 116 | ```python |
99 | | -# Stream and process audio chunks |
100 | | -for chunk in client.tts.stream(text="Long text content..."): |
101 | | - # Process each chunk as it arrives |
| 117 | +# Stream and process chunks as they arrive |
| 118 | +for chunk in client.tts.stream(text="Long content..."): |
102 | 119 | send_to_websocket(chunk) |
103 | 120 |
|
104 | 121 | # Or collect all chunks |
105 | 122 | audio = client.tts.stream(text="Hello!").collect() |
106 | 123 | ``` |
107 | 124 |
|
108 | | -#### Real-time WebSocket Streaming |
| 125 | +[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/text-to-speech) |
109 | 126 |
|
110 | | -For low-latency bidirectional streaming where you send text chunks and receive audio in real-time: |
| 127 | +### Speech-to-Text |
111 | 128 |
|
112 | 129 | ```python |
113 | | -from fishaudio import FishAudio |
114 | | -from fishaudio.utils import play |
| 130 | +# Transcribe audio |
| 131 | +with open("audio.wav", "rb") as f: |
| 132 | + result = client.asr.transcribe(audio=f.read(), language="en") |
115 | 133 |
|
116 | | -client = FishAudio() |
| 134 | +print(result.text) |
| 135 | + |
| 136 | +# Access timestamped segments |
| 137 | +for segment in result.segments: |
| 138 | + print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}") |
| 139 | +``` |
| 140 | + |
| 141 | +[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/speech-to-text) |
| 142 | + |
| 143 | +### Real-time Streaming |
| 144 | + |
| 145 | +Stream dynamically generated text for conversational AI and live applications: |
| 146 | + |
| 147 | +**Synchronous:** |
117 | 148 |
|
118 | | -# Stream text chunks and receive audio in real-time |
| 149 | +```python |
119 | 150 | def text_chunks(): |
120 | 151 | yield "Hello, " |
121 | 152 | yield "this is " |
122 | | - yield "streaming audio!" |
| 153 | + yield "streaming!" |
123 | 154 |
|
124 | 155 | audio_stream = client.tts.stream_websocket(text_chunks(), latency="balanced") |
125 | 156 | play(audio_stream) |
126 | 157 | ``` |
127 | 158 |
|
128 | | -### Speech Recognition (ASR) |
129 | | - |
130 | | -To transcribe audio to text: |
| 159 | +**Asynchronous:** |
131 | 160 |
|
132 | 161 | ```python |
133 | | -from fishaudio import FishAudio |
134 | | - |
135 | | -client = FishAudio() |
| 162 | +async def text_chunks(): |
| 163 | + yield "Hello, " |
| 164 | + yield "this is " |
| 165 | + yield "streaming!" |
136 | 166 |
|
137 | | -# Transcribe audio to text |
138 | | -with open("audio.wav", "rb") as f: |
139 | | - result = client.asr.transcribe(audio=f.read()) |
140 | | - print(result.text) |
| 167 | +audio_stream = await client.tts.stream_websocket(text_chunks(), latency="balanced") |
| 168 | +play(audio_stream) |
141 | 169 | ``` |
142 | 170 |
|
143 | | -### Voice Management |
| 171 | +[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/websocket) |
144 | 172 |
|
145 | | -Manage voice references and list available voices. |
| 173 | +### Voice Cloning |
146 | 174 |
|
147 | | -```python |
148 | | -from fishaudio import FishAudio |
| 175 | +**Instant cloning:** |
149 | 176 |
|
150 | | -client = FishAudio() |
151 | | - |
152 | | -# List available voices |
153 | | -voices = client.voices.list(language="en", tags="male") |
154 | | - |
155 | | -# Get a specific voice by ID |
156 | | -voice = client.voices.get(voice_id="802e3bc2b27e49c2995d23ef70e6ac89") |
| 177 | +```python |
| 178 | +from fishaudio.types import ReferenceAudio |
157 | 179 |
|
158 | | -# Create a custom voice |
159 | | -with open("voice_sample.wav", "rb") as f: |
160 | | - new_voice = client.voices.create( |
161 | | - title="My Custom Voice", |
162 | | - voices=[f.read()], |
163 | | - description="My cloned voice" |
| 180 | +# Clone voice on-the-fly |
| 181 | +with open("reference.wav", "rb") as f: |
| 182 | + audio = client.tts.convert( |
| 183 | + text="Cloned voice speaking", |
| 184 | + references=[ReferenceAudio( |
| 185 | + audio=f.read(), |
| 186 | + text="Text spoken in reference" |
| 187 | + )] |
164 | 188 | ) |
165 | 189 | ``` |
166 | 190 |
|
167 | | -### Async Usage |
168 | | - |
169 | | -You can also use the SDK in asynchronous applications: |
| 191 | +**Persistent voice models:** |
170 | 192 |
|
171 | 193 | ```python |
172 | | -import asyncio |
173 | | -from fishaudio import AsyncFishAudio |
174 | | - |
175 | | -async def main(): |
176 | | - client = AsyncFishAudio() |
177 | | - |
178 | | - audio = await client.tts.convert(text="Async text-to-speech!") |
179 | | - # Process audio... |
| 194 | +# Create voice model for reuse |
| 195 | +with open("voice_sample.wav", "rb") as f: |
| 196 | + voice = client.voices.create( |
| 197 | + title="My Voice", |
| 198 | + voices=[f.read()], |
| 199 | + description="Custom voice clone" |
| 200 | + ) |
180 | 201 |
|
181 | | -asyncio.run(main()) |
| 202 | +# Use the created model |
| 203 | +audio = client.tts.convert( |
| 204 | + text="Using my saved voice", |
| 205 | + reference_id=voice.id |
| 206 | +) |
182 | 207 | ``` |
183 | 208 |
|
184 | | -### Account |
| 209 | +[Learn more](https://docs.fish.audio/developer-guide/sdk-guide/python/voice-cloning) |
185 | 210 |
|
186 | | -Check your remaining API credits, usage, and account details: |
| 211 | +## Resource Clients |
187 | 212 |
|
188 | | -```python |
189 | | -from fishaudio import FishAudio |
| 213 | +| Resource | Description | Key Methods | |
| 214 | +|----------|-------------|-------------| |
| 215 | +| `client.tts` | Text-to-speech | `convert()`, `stream()`, `stream_websocket()` | |
| 216 | +| `client.asr` | Speech recognition | `transcribe()` | |
| 217 | +| `client.voices` | Voice management | `list()`, `get()`, `create()`, `update()`, `delete()` | |
| 218 | +| `client.account` | Account info | `get_credits()`, `get_package()` | |
190 | 219 |
|
191 | | -client = FishAudio() |
192 | | -credits = client.account.get_credits() |
193 | | -print(f"Remaining credits: {credits.credit}") |
194 | | -``` |
195 | | - |
196 | | - |
197 | | -### Optional Dependencies |
| 220 | +## Error Handling |
198 | 221 |
|
199 | | -For audio playback utilities to help with playing and saving audio files, install the `utils` extra: |
| 222 | +```python |
| 223 | +from fishaudio.exceptions import ( |
| 224 | + AuthenticationError, |
| 225 | + RateLimitError, |
| 226 | + ValidationError, |
| 227 | + FishAudioError |
| 228 | +) |
200 | 229 |
|
201 | | -```bash |
202 | | -pip install fish-audio-sdk[utils] |
| 230 | +try: |
| 231 | + audio = client.tts.convert(text="Hello!") |
| 232 | +except AuthenticationError: |
| 233 | + print("Invalid API key") |
| 234 | +except RateLimitError: |
| 235 | + print("Rate limit exceeded") |
| 236 | +except ValidationError as e: |
| 237 | + print(f"Invalid request: {e}") |
| 238 | +except FishAudioError as e: |
| 239 | + print(f"API error: {e}") |
203 | 240 | ``` |
204 | 241 |
|
205 | | -## Legacy SDK |
206 | | - |
207 | | -The legacy `fish_audio_sdk` module continues to be supported for existing projects: |
| 242 | +## Resources |
208 | 243 |
|
209 | | -```python |
210 | | -from fish_audio_sdk import Session |
211 | | - |
212 | | -session = Session("your_api_key") |
213 | | -``` |
| 244 | +- **Documentation:** [SDK Guide](https://docs.fish.audio/developer-guide/sdk-guide/python/) | [API Reference](https://docs.fish.audio/api-reference/sdk/python/) |
| 245 | +- **Package:** [PyPI](https://pypi.org/project/fish-audio-sdk/) | [GitHub](https://github.com/fishaudio/fish-audio-python) |
| 246 | +- **Legacy SDK:** [Documentation](https://docs.fish.audio/archive/python-sdk-legacy) | [Migration Guide](https://docs.fish.audio/archive/python-sdk-legacy/migration-guide) |
214 | 247 |
|
215 | | -For complete legacy SDK documentation, see the [Legacy API Documentation](https://docs.fish.audio/archive/python-sdk-legacy). |
| 248 | +## License |
216 | 249 |
|
217 | | -We recommend migrating to the new `fishaudio` module - see our [Migration Guide](https://docs.fish.audio/archive/python-sdk-legacy/migration-guide) for assistance. |
| 250 | +This project is licensed under the Apache-2.0 License - see the [LICENSE](LICENSE) file for details. |
0 commit comments