feat: add DashScope adapter for Qwen3-ASR real-time speech recognition#3967
Merged
feat: add DashScope adapter for Qwen3-ASR real-time speech recognition#3967
Conversation
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote-storybook canceled.
|
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…tener2) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…sion.finish for finalization Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
…oviders) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: add DashScope adapter for Qwen3-ASR real-time STT
Summary
Adds a new
DashScopeAdapterfor Alibaba Cloud's DashScope platform (Model Studio), supporting the Qwen3-ASR real-time speech recognition model via WebSocket. Batch transcription is stubbed as a no-op per request.DashScope's real-time API uses a WebSocket protocol very similar to OpenAI's Realtime API (same event names like
session.created,input_audio_buffer.speech_started,conversation.item.input_audio_transcription.completed), with DashScope-specific differences in the session config structure and a.textevent instead of OpenAI's.deltafor streaming transcripts.Provider naming rationale: "DashScope" is the API platform name (like "OpenAI"), while "Qwen3-ASR" is just a model within it. Both the China (
dashscope.aliyuncs.com) and International (dashscope-intl.aliyuncs.com) endpoints use the same protocol — the adapter handles both via URL-based region detection, defaulting to International.Files changed:
adapter/dashscope/mod.rs—DashScopeAdapterstruct, URL building, language supportadapter/dashscope/live.rs—RealtimeSttAdapterimpl (WebSocket session config, audio encoding, response parsing)adapter/dashscope/batch.rs— No-opBatchSttAdapterimplproviders.rs— NewProvider::DashScopevariant with all required provider metadataadapter/mod.rs— NewAdapterKind::DashScope, wired into language support and provider mappinglib.rs— ExportDashScopeAdapterReview & Testing Checklist for Human
initial_messagesends asession.updatewithmodalities,transcription(containingmodel,language,input_audio_format,input_sample_rate), andturn_detection. This was inferred from the DashScope SDK sample code, but the exact wire format has not been validated against a live connection. The Java SDK wraps this throughOmniRealtimeConfig/updateSession()— the actual JSON shape may differ.input_audio_buffer.commit(same as OpenAI). The DashScope SDK sample callsconversation.endSession()which may send a different message (the code comment mentionssession.finish). Needs verification..textvs.delta: DashScope usesconversation.item.input_audio_transcription.textfor streaming deltas (vs OpenAI's.delta). Verify this matches real server responses.aliyuncs.combreadth: The domain match will capture all*.aliyuncs.comsubdomains (including non-DashScope services like OSS). Consider whether this needs to be narrowed.Recommended test plan: Connect to
wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtimewith a validDASHSCOPE_API_KEYand stream audio to verify the session handshake and transcription events work as expected. The ignored tests inlive.rscan be run withcargo test -p owhisper-client -- --ignored test_build_single.Notes
BatchSttAdaptertrait signature.NoDataquality (same as OpenAI/Fireworks adapters for new integrations).Link to Devin run: https://app.devin.ai/sessions/87af043c71fa4b2bb3d49717a3235510
Requested by: @yujonglee