feat: add Mistral audio transcription adapter#3968
Conversation
Add MistralAdapter implementing both RealtimeSttAdapter and BatchSttAdapter: - Realtime: WebSocket at /v1/audio/transcriptions/realtime with base64 PCM audio, session.update for audio format config, and parsing of transcription.text.delta, transcription.segment, and error events - Batch: Multipart POST to /v1/audio/transcriptions with verbose_json response format and segment-level timestamps - Register Mistral in AdapterKind, Provider, and public exports Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
✅ Deploy Preview for hyprnote canceled.
|
✅ Deploy Preview for hyprnote-storybook canceled.
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…rustfmt Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
| Provider::Gladia => GladiaAdapter.build_ws_url(api_base, params, channels), | ||
| Provider::ElevenLabs => ElevenLabsAdapter.build_ws_url(api_base, params, channels), | ||
| Provider::Mistral => MistralAdapter.build_ws_url(api_base, params, channels), | ||
| } |
There was a problem hiding this comment.
🚩 Proxy path forwards raw binary audio, but Mistral expects base64 JSON text messages
The transcribe-proxy relay handler at crates/transcribe-proxy/src/relay/handler.rs:273-284 forwards client binary WebSocket messages directly to the upstream as binary frames. However, Mistral's realtime WebSocket API expects audio to arrive as JSON text messages with base64-encoded PCM data (the input_audio.append format defined at crates/owhisper-client/src/adapter/mistral/live.rs:56-63).
This means the proxy path (build_proxy_with_adapter in crates/transcribe-proxy/src/routes/streaming/hyprnote.rs:123) will not work correctly for Mistral — binary audio from clients would be forwarded as-is rather than wrapped in base64 JSON.
However, this is the same pre-existing limitation that affects the OpenAI adapter (which also requires base64 JSON via input_audio_buffer.append). Both providers are wired into the proxy dispatch without any binary-to-text audio transformation. The direct client path (ListenClient) handles this correctly via audio_to_message(). If the proxy is not actually used for these providers in production, this is a non-issue — but worth confirming.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds a
MistralAdaptertoowhisper-clientimplementing bothRealtimeSttAdapter(WebSocket) andBatchSttAdapter(HTTP).Realtime (
live.rs): Connects towss://api.mistral.ai/v1/audio/transcriptions/realtimeusing the modelvoxtral-mini-transcribe-realtime-2602. Sends base64-encoded PCM audio viainput_audio.appendJSON messages. Parsestranscription.text.delta(interim) andtranscription.segment(final, with timestamps) events.Batch (
batch.rs): Multipart POST to/v1/audio/transcriptionswithverbose_jsonresponse format and segment-level timestamps. Default model isvoxtral-mini-latest.Registration:
Mistralvariant added toProvider,AdapterKind, and public exports.The WebSocket protocol was derived from Mistral's Python SDK source, not official WebSocket API docs.
Review & Testing Checklist for Human
input_audio.append,input_audio.end,session.update,transcription.segment,transcription.text.delta) were reverse-engineered from the Python SDK. Runtest_build_single/test_build_dualwith a realMISTRAL_API_KEYto confirm the handshake and event flow works end-to-end.verbose_jsonresponse shape —MistralBatchResponseexpects{ model, text, language, segments: [{ text, start, end }] }. Runtest_mistral_transcribewith a real key against a known audio file to confirm deserialization succeeds and segments populate correctly.transcription.segmentevents only provide segment-levelstart/end. Word timestamps are estimated by dividing segment duration evenly across words. Verify this approximation is acceptable for downstream consumers (transcript UI, word highlighting, etc.).language_support_live/batchreturnsSupported { quality: NoData }unconditionally (same pattern as OpenAI adapter). Confirm this is the desired behavior or if Mistral has a known supported language list.Notes
#[ignore]gated onMISTRAL_API_KEY— only unit tests for JSON parsing run in CI.