Skip to content

feat: add Mistral audio transcription adapter#3968

Merged
yujonglee merged 5 commits intomainfrom
devin/1771061713-mistral-adapter
Feb 14, 2026
Merged

feat: add Mistral audio transcription adapter#3968
yujonglee merged 5 commits intomainfrom
devin/1771061713-mistral-adapter

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 14, 2026

Summary

Adds a MistralAdapter to owhisper-client implementing both RealtimeSttAdapter (WebSocket) and BatchSttAdapter (HTTP).

Realtime (live.rs): Connects to wss://api.mistral.ai/v1/audio/transcriptions/realtime using the model voxtral-mini-transcribe-realtime-2602. Sends base64-encoded PCM audio via input_audio.append JSON messages. Parses transcription.text.delta (interim) and transcription.segment (final, with timestamps) events.

Batch (batch.rs): Multipart POST to /v1/audio/transcriptions with verbose_json response format and segment-level timestamps. Default model is voxtral-mini-latest.

Registration: Mistral variant added to Provider, AdapterKind, and public exports.

The WebSocket protocol was derived from Mistral's Python SDK source, not official WebSocket API docs.

Review & Testing Checklist for Human

  • Verify WebSocket message format against live Mistral API — the event types (input_audio.append, input_audio.end, session.update, transcription.segment, transcription.text.delta) were reverse-engineered from the Python SDK. Run test_build_single / test_build_dual with a real MISTRAL_API_KEY to confirm the handshake and event flow works end-to-end.
  • Verify batch verbose_json response shapeMistralBatchResponse expects { model, text, language, segments: [{ text, start, end }] }. Run test_mistral_transcribe with a real key against a known audio file to confirm deserialization succeeds and segments populate correctly.
  • Word timestamps are interpolated, not from API — both batch and realtime transcription.segment events only provide segment-level start/end. Word timestamps are estimated by dividing segment duration evenly across words. Verify this approximation is acceptable for downstream consumers (transcript UI, word highlighting, etc.).
  • Language support claims all languages supportedlanguage_support_live/batch returns Supported { quality: NoData } unconditionally (same pattern as OpenAI adapter). Confirm this is the desired behavior or if Mistral has a known supported language list.

Notes


Open with Devin

Add MistralAdapter implementing both RealtimeSttAdapter and BatchSttAdapter:

- Realtime: WebSocket at /v1/audio/transcriptions/realtime with base64 PCM audio,
  session.update for audio format config, and parsing of transcription.text.delta,
  transcription.segment, and error events
- Batch: Multipart POST to /v1/audio/transcriptions with verbose_json response format
  and segment-level timestamps
- Register Mistral in AdapterKind, Provider, and public exports

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@netlify
Copy link

netlify bot commented Feb 14, 2026

Deploy Preview for hyprnote canceled.

Name Link
🔨 Latest commit c0718e0
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/69904a21b11e7900088086c8

@netlify
Copy link

netlify bot commented Feb 14, 2026

Deploy Preview for hyprnote-storybook canceled.

Name Link
🔨 Latest commit c0718e0
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/69904a21950cfb0008f4f192

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration bot and others added 2 commits February 14, 2026 09:43
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

…rustfmt

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@yujonglee yujonglee merged commit 4c97631 into main Feb 14, 2026
24 of 25 checks passed
@yujonglee yujonglee deleted the devin/1771061713-mistral-adapter branch February 14, 2026 10:13
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines 57 to 60
Provider::Gladia => GladiaAdapter.build_ws_url(api_base, params, channels),
Provider::ElevenLabs => ElevenLabsAdapter.build_ws_url(api_base, params, channels),
Provider::Mistral => MistralAdapter.build_ws_url(api_base, params, channels),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Proxy path forwards raw binary audio, but Mistral expects base64 JSON text messages

The transcribe-proxy relay handler at crates/transcribe-proxy/src/relay/handler.rs:273-284 forwards client binary WebSocket messages directly to the upstream as binary frames. However, Mistral's realtime WebSocket API expects audio to arrive as JSON text messages with base64-encoded PCM data (the input_audio.append format defined at crates/owhisper-client/src/adapter/mistral/live.rs:56-63).

This means the proxy path (build_proxy_with_adapter in crates/transcribe-proxy/src/routes/streaming/hyprnote.rs:123) will not work correctly for Mistral — binary audio from clients would be forwarded as-is rather than wrapped in base64 JSON.

However, this is the same pre-existing limitation that affects the OpenAI adapter (which also requires base64 JSON via input_audio_buffer.append). Both providers are wired into the proxy dispatch without any binary-to-text audio transformation. The direct client path (ListenClient) handles this correctly via audio_to_message(). If the proxy is not actually used for these providers in production, this is a non-issue — but worth confirming.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant