Commit f7328a4
authored
Fix server (STT, TTS) (#460)
* Add python-multipart dependency to pyproject.toml for multipart form handling
* Refactor tokenizer usage in Whisper model files
- Removed direct imports and references to the Tokenizer class in favor of using model methods for tokenizer retrieval.
- Updated function signatures to accept a generic tokenizer parameter instead of a specific Tokenizer type.
- Enhanced the `detect_language` and `get_suppress_tokens` functions to utilize the model's tokenizer method, improving flexibility and reducing dependencies.
- Deleted unused tokenizer asset files to streamline the codebase.
* Add instruct and verbose fields to SpeechRequest model
- Updated the SpeechRequest model to include an optional 'instruct' field for additional instructions.
- Added a 'verbose' field to control output verbosity during audio generation.
- Refactored the generate_audio function to utilize the new fields, enhancing flexibility in audio processing.
* Add TranscriptionRequest model and update stt_transcriptions function
- Introduced a new TranscriptionRequest model to encapsulate parameters for audio transcription, including fields for language, verbosity, and streaming options.
- Updated the stt_transcriptions function to utilize the new model, enhancing parameter management and flexibility.
- Implemented logic to handle streaming results and filter generation parameters based on the model's signature, improving the transcription process.
* Add streaming transcription support and refactor STT model handling
- Introduced a new `generate_transcription_stream` function to handle streaming transcription, yielding results in real-time and managing temporary file cleanup.
- Updated the `stt_transcriptions` endpoint to utilize the new streaming function, enhancing the responsiveness of audio transcription.
- Refactored the `Qwen3ASRModel` to support streaming transcriptions, including a new `StreamingResult` data class for structured output.
- Improved model remapping logic in `get_model_category` to prioritize explicit remapping matches, enhancing model loading flexibility.
- Suppressed warnings during tokenizer loading in both `Qwen3ASRModel` and `ForcedAlignerModel` to improve user experience during model initialization.
* format
* fix tests
* Enhance chunk processing in Qwen3ASRModel with progress indication
- Added a tqdm progress bar to the chunk processing loop in the Qwen3ASRModel, improving user feedback during audio processing.
- The progress bar is configurable based on verbosity and the number of chunks, enhancing the overall user experience.
* Refactor audio chunk parameters in Qwen3ASRModel and related functions
- Renamed `max_chunk_sec` and `min_chunk_sec` to `chunk_duration` and `min_chunk_duration` respectively for clarity and consistency across the codebase.
- Updated function signatures and internal logic to reflect the new parameter names, ensuring proper handling of audio chunk durations.
- Adjusted test cases to align with the updated parameter names, maintaining test integrity.
* format1 parent be5676f commit f7328a4
File tree
16 files changed
+419
-100917
lines changed- mlx_audio
- stt
- models
- qwen3_asr
- whisper
- assets
- tests
- tests
16 files changed
+419
-100917
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
153 | 154 | | |
154 | 155 | | |
155 | 156 | | |
| 157 | + | |
156 | 158 | | |
157 | 159 | | |
158 | 160 | | |
| |||
165 | 167 | | |
166 | 168 | | |
167 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
168 | 184 | | |
169 | 185 | | |
170 | 186 | | |
| |||
234 | 250 | | |
235 | 251 | | |
236 | 252 | | |
237 | | - | |
| 253 | + | |
238 | 254 | | |
239 | 255 | | |
240 | 256 | | |
| |||
258 | 274 | | |
259 | 275 | | |
260 | 276 | | |
| 277 | + | |
261 | 278 | | |
262 | 279 | | |
263 | 280 | | |
264 | 281 | | |
265 | 282 | | |
266 | 283 | | |
267 | 284 | | |
| 285 | + | |
268 | 286 | | |
269 | 287 | | |
270 | 288 | | |
| |||
287 | 305 | | |
288 | 306 | | |
289 | 307 | | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
290 | 340 | | |
291 | 341 | | |
292 | 342 | | |
293 | 343 | | |
294 | 344 | | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
295 | 353 | | |
296 | 354 | | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
297 | 369 | | |
298 | 370 | | |
299 | 371 | | |
300 | 372 | | |
301 | 373 | | |
302 | 374 | | |
303 | 375 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
309 | 389 | | |
310 | 390 | | |
311 | 391 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
0 commit comments