-
Notifications
You must be signed in to change notification settings - Fork 467
feat: add Gladia realtime STT adapter #2115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-By: yujonglee <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds a Gladia realtime STT adapter: WebSocket URL/auth construction (with API-key and proxy support), an HTTP init handshake to obtain a session URL, native multi-channel support, comprehensive serde types for Gladia messages, transcript parsing into internal StreamResponse events, and trait implementations plus tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Local as Local Client
participant Adapter as GladiaAdapter
participant InitAPI as Gladia REST Init
participant WS as Gladia WebSocket
participant Parser as Adapter Parser
Local->>Adapter: build_ws_url_with_api_key(api_base, params, channels, api_key)
Adapter->>InitAPI: HTTP POST /v2/live (GladiaConfig JSON)
alt init success
InitAPI-->>Adapter: 200 OK with session_url
Adapter->>WS: Open WebSocket to session_url (or built URL)
loop audio frames
Local->>WS: send audio frames (binary)
WS-->>Parser: JSON messages (Transcript, SpeechStart, SpeechEnd, Error, EndSession)
Parser-->>Local: StreamResponse events (partials/finals, metadata)
end
WS-->>Local: EndSession
else init failure
InitAPI-->>Adapter: error response
Adapter-->>Local: StreamResponse::Error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/gladia/live.rs (1)
80-81: Consider handling serialization failure gracefully.While
serde_json::to_stringrarely fails for simple structs, using.unwrap()could panic in edge cases. Consider logging and returningNoneon failure for robustness.- let json = serde_json::to_string(&cfg).unwrap(); - Some(Message::Text(json.into())) + match serde_json::to_string(&cfg) { + Ok(json) => Some(Message::Text(json.into())), + Err(e) => { + tracing::error!(error = ?e, "gladia_config_serialization_failed"); + None + } + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
owhisper/owhisper-client/src/adapter/gladia/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/gladia/mod.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
owhisper/owhisper-client/src/adapter/mod.rs (4)
host_matches(102-107)build_proxy_ws_url(117-138)extract_query_params(84-88)is_local_host(80-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (6)
owhisper/owhisper-client/src/adapter/mod.rs (1)
6-6: LGTM!The module declaration and re-export follow the established pattern for other adapters in this file.
Also applies to: 18-18
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
10-12: Verify: is_supported_languages always returns true.This always returns
trueregardless of the languages provided. If this is intentional (Gladia supports all languages), consider adding a brief comment. Otherwise, implement actual language validation.owhisper/owhisper-client/src/adapter/gladia/live.rs (4)
10-30: LGTM!The provider identity, multichannel support flag, and WebSocket URL construction with query parameter preservation are correctly implemented.
128-157: LGTM!Configuration structs are well-designed with appropriate
skip_serializing_ifannotations and theuntaggedenum for polymorphic language configuration.
159-196: LGTM!The
GladiaMessageenum with tagged discriminant and#[serde(other)]for unknown messages provides robust message parsing with good forward compatibility.
302-339: LGTM!Tests are appropriately marked as
#[ignore]since they require aGLADIA_API_KEYenvironment variable and make actual API calls. The test structure follows the patterns established by other adapters.
- Add build_ws_url_with_api_key method to RealtimeSttAdapter trait - Use ureq for blocking POST request to get session token - Fix language_config format to use object with languages array - Return None for build_auth_header since token is in URL Co-Authored-By: yujonglee <[email protected]>
Co-Authored-By: yujonglee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
owhisper/owhisper-client/src/adapter/gladia/live.rs (2)
148-171: Channel count inTerminalResponseis still hardcoded to 1.
StreamResponse::TerminalResponse { channels: 1, .. }doesn’t reflect actual channel count for multichannel sessions; it should ideally come from the same channel configuration used when opening the session (or from session metadata), with a sane fallback.This has already been flagged in a prior review; reiterating here for visibility.
301-357:channel_indextotal is hardcoded to 1; doesn’t match multichannel semantics.
channel_index: vec![channel_idx, 1]hardcodes the “total channels” element to 1, even though this adapter advertises native multichannel support and sends achannelsvalue in the init body. Ideally the total should reflect the actual configured number of channels (e.g., from session state or the init config), or this limitation should be explicitly documented.This is the same issue previously called out in earlier review feedback.
🧹 Nitpick comments (5)
owhisper/owhisper-client/Cargo.toml (1)
16-16: Consider whether you really need bothreqwestandureq.You now pull in two HTTP clients (
reqwestandureq), which increases binary size and maintenance surface. If you don’t have a strong reason to keep both (e.g., sync vs async constraints), it may be cleaner to standardize on one.owhisper/owhisper-client/src/adapter/gladia/live.rs (4)
32-125: Be explicit about behavior whenapi_keyis missing or init fails.Right now, if there’s no
api_keyor the/v2/liveinit fails, this method returnsNone, and the caller silently falls back tobuild_ws_url, which for Gladia likely won’t work anyway. That makes configuration or network errors harder to diagnose.Consider either:
- Treating a missing key as a hard configuration error for direct Gladia usage (e.g., log a clear error and perhaps return Some(fallback_url) only for proxy cases), or
- Returning a URL only when init succeeds and making the fallback path clearly “best-effort / not supported for Gladia” in comments.
This will make failures easier to reason about when things go wrong in production.
95-121: Blockingureqcall in URL builder may be surprising.
build_ws_url_with_api_keyperforms a synchronous HTTP request withureqas part of building the WebSocket URL. IfListenClientBuilder::build_*is called on an async runtime’s main thread, this can block the reactor and add connection latency.If possible, consider:
- Moving the
/v2/liveinit into an async path using your existing HTTP client (reqwest) and passing the resulting URL into the builder, or- At least documenting that this call is blocking and should be run off the main async task if that’s a concern for your consumers.
306-343: Consider logging hygiene for transcript payloads.
tracing::debug!(transcript = %utterance.text, ...)logs full user transcripts, which may contain sensitive content. For production builds, that can be problematic unless logs are tightly controlled.You might want to:
- Redact or truncate transcript text in logs, or
- Guard this with a feature flag or higher log level used only in local debugging.
361-397: Ignored integration tests are a good start; consider documenting how to run them.The ignored tests for single and dual Gladia clients look useful for manual verification. It could help future maintainers to:
- Add a short comment explaining these require
GLADIA_API_KEYand real network access, and/or- Provide a brief note in your README or dev docs describing how to opt-in to running these tests.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
owhisper/owhisper-client/Cargo.toml(1 hunks)owhisper/owhisper-client/src/adapter/gladia/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(3 hunks)owhisper/owhisper-client/src/lib.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- owhisper/owhisper-client/src/adapter/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: Devin
🔇 Additional comments (1)
owhisper/owhisper-client/src/lib.rs (1)
72-78: URL construction change looks correct and remains backward compatible.Using
build_ws_url_with_api_key(..., api_key)with a fallback tobuild_ws_urlcleanly extends adapters that need an API-key-aware handshake (like Gladia) while keeping existing adapters unaffected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
owhisper/owhisper-client/Cargo.toml (1)
18-18: Consider pinning to a more specific version.The current constraint
"2"allows any version from 2.0.0 to 2.x.x. For reproducibility and to avoid surprise breaking changes in minor/patch updates, consider pinning to a specific version (e.g.,"2.1.0"or the latest stable 2.x release).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
owhisper/owhisper-client/Cargo.toml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: Devin
…index Co-Authored-By: yujonglee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
owhisper/owhisper-client/src/adapter/gladia/live.rs (1)
167-175: Hardcodedchannels: 1in TerminalResponse.The
channelsfield is hardcoded to1, which will be incorrect for multichannel sessions. Since the adapter supports native multichannel (line 15-17), the terminal response should reflect the actual channel count.Consider passing the channel count through the adapter state or session context:
GladiaMessage::EndSession { id } => { tracing::debug!(session_id = %id, "gladia_session_ended"); + // TODO: Retrieve actual channel count from session state vec![StreamResponse::TerminalResponse { request_id: id, created: String::new(), duration: 0.0, - channels: 1, + channels: 1, // FIXME: Should be actual channel count from session }] }owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
22-54: Multiple panic points on invalid input.The function uses
.expect()at lines 27, 36, and 52, which will panic if URLs are malformed. Sinceapi_basecomes from user configuration, this could crash the application.Consider returning a
Resultor using fallback to default URL:- pub(crate) fn build_ws_url_from_base(api_base: &str) -> (url::Url, Vec<(String, String)>) { + pub(crate) fn build_ws_url_from_base(api_base: &str) -> Result<(url::Url, Vec<(String, String)>), String> { if api_base.is_empty() { return ( format!("wss://{}{}", DEFAULT_API_HOST, WS_PATH) - .parse() - .expect("invalid_default_ws_url"), + .parse() + .map_err(|e| format!("invalid_default_ws_url: {}", e))?, Vec::new(), ); } if let Some(proxy_result) = super::build_proxy_ws_url(api_base) { - return proxy_result; + return Ok(proxy_result); } - let parsed: url::Url = api_base.parse().expect("invalid_api_base"); + let parsed: url::Url = api_base.parse().map_err(|e| format!("invalid_api_base: {}", e))?; let existing_params = super::extract_query_params(&parsed); let host = parsed.host_str().unwrap_or(DEFAULT_API_HOST); let scheme = if super::is_local_host(host) { "ws" } else { "wss" }; let host_with_port = match parsed.port() { Some(port) => format!("{host}:{port}"), None => host.to_string(), }; let url: url::Url = format!("{scheme}://{host_with_port}{WS_PATH}") - .parse() - .expect("invalid_ws_url"); - (url, existing_params) + .parse() + .map_err(|e| format!("invalid_ws_url: {}", e))?; + Ok((url, existing_params)) }
🧹 Nitpick comments (2)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)
18-20: Consider more specific host validation.The substring check
host.contains("gladia.io")could match malicious domains like"gladia.io.attacker.com". Consider checking for exact match or proper domain suffix.Apply this diff for more secure host matching:
- pub(crate) fn is_gladia_host(host: &str) -> bool { - host.contains("gladia.io") - } + pub(crate) fn is_gladia_host(host: &str) -> bool { + host == "gladia.io" || host == "api.gladia.io" || host.ends_with(".gladia.io") + }
10-12: Document the language support assumption or add validation if needed.Gladia's Live WebSocket API supports 100+ languages, making the unconditional
truereturn reasonable in practice. However, the underscore-prefixed parameter suggests the languages aren't being validated. Consider whether this is intentional (trusting Gladia's broad support) and document it, or add explicit language validation if there are actual restrictions. For context,DeepgramAdapter::is_supported_languagesvalidates against aSUPPORTED_LANGUAGESconstant, offering a more defensive approach.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
owhisper/owhisper-client/src/adapter/gladia/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/gladia/mod.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
owhisper/owhisper-client/src/adapter/mod.rs (4)
host_matches(112-117)build_proxy_ws_url(127-148)extract_query_params(94-98)is_local_host(90-92)
owhisper/owhisper-client/src/adapter/gladia/live.rs (1)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
build_ws_url_from_base(22-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (8)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (2)
1-7: LGTM!The module structure and struct definition are clean and appropriate for the adapter pattern.
57-101: Tests look good but need update if error handling changes.The test coverage is comprehensive, covering various URL formats including empty, standard, with port, proxy, and localhost. However, if
build_ws_url_from_baseis changed to returnResult(as suggested above), these tests will need to be updated to unwrap the Result.owhisper/owhisper-client/src/adapter/gladia/live.rs (6)
1-9: LGTM!Imports are clean and appropriate for the implementation.
19-30: Depends on panic-pronebuild_ws_url_from_base.This method calls
Self::build_ws_url_from_basewhich can panic on invalid URLs (flagged in mod.rs). Ensure that issue is resolved to prevent runtime crashes.
32-129: LGTM! Two-step authentication flow correctly implemented.The function properly implements Gladia's recommended flow: (1) POST to
/v2/livewith API key to obtain session URL, (2) return the session URL for client connection. Error handling with tracing logs is appropriate.
131-150: LGTM!The trait methods are correctly implemented for Gladia's protocol. Authentication via session URL (not headers), no keep-alive or initial messages needed, and proper finalization message.
192-303: LGTM!The data structures are well-designed with appropriate serde attributes for JSON serialization/deserialization. The use of
#[serde(default)], tagged enums, and fallback variants demonstrates good API integration practices.
369-406: LGTM!Tests are properly structured and marked with
#[ignore]since they require a real API key. Good coverage of both single and dual-channel scenarios.
…ranscripts Co-Authored-By: yujonglee <[email protected]>
Summary
Adds a new
GladiaAdapterfor real-time speech-to-text using Gladia's WebSocket API. This implementation handles Gladia's two-step initialization flow:/v2/livewith audio config to get a session tokenKey changes:
GladiaAdapterstruct implementingRealtimeSttAdaptertraitbuild_ws_url_with_api_keyoptional method toRealtimeSttAdaptertrait (with default fallback tobuild_ws_url)ureqfor the blocking POST request (chosen overreqwest::blockingto avoid tokio runtime conflicts)Updates since last revision
Addressed review feedback:
live.rsandmod.rsnow preserve non-standard ports when constructing URLs fromapi_base(e.g.,http://localhost:8080orhttps://api.gladia.io:8443)session_channels()registry that:parse_transcriptfor accuratechannel_indexEndSessionforTerminalResponse.channelsReview & Testing Checklist for Human
InitResponse.idmatchesEndSession.idandTranscriptMessage.session_id. Run with debug logging to confirm these IDs match in real Gladia traffic.GLADIA_API_KEY=<key> cargo test -p owhisper-client gladia -- --ignored --nocapturebuild_ws_url_with_api_keymethod has a default implementation. Confirm no regressions in other adapters (Deepgram, AssemblyAI, Soniox).Mutex<HashMap>- verify this is appropriate for concurrent session handling.Recommended test plan:
Notes
#[ignore]since they require a real API keyureqwas necessary becausereqwest::blockingcreates its own tokio runtime which panics when called from within an async test contextEndSessionto prevent unbounded growthRequested by: @yujonglee ([email protected])
Devin Session: https://app.devin.ai/sessions/f1fae0e50661471abba69597a3ad8b93