-
Notifications
You must be signed in to change notification settings - Fork 467
feat(owhisper): add OpenAI Realtime API adapter #2126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add OpenAI adapter for real-time speech-to-text transcription using the OpenAI Realtime API. The adapter implements the RealtimeSttAdapter trait and supports: - WebSocket connection to wss://api.openai.com/v1/realtime - Session configuration for transcription mode - Parsing of transcription events (completed, delta, failed) - Server-side VAD for turn detection Note: The API configuration is still being finalized as there are two session types (realtime vs transcription) with different schemas. Co-Authored-By: yujonglee <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds a new OpenAI realtime STT adapter: registers the module and enum variant, detects OpenAI hosts, builds WebSocket URLs and auth, implements WS lifecycle (initialization, finalize, parsing) and converts OpenAI realtime events into the client's transcript StreamResponse formats. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant OpenAI_WS as "OpenAI Realtime WS"
participant Adapter as "OpenAIAdapter"
participant Processor as "Transcript Builder"
Client->>Adapter: create connection (api_base, model, token)
Adapter->>OpenAI_WS: open wss URL + Authorization header
Adapter->>OpenAI_WS: send initial session.update (JSON)
Client->>OpenAI_WS: stream audio / send audio frames
OpenAI_WS-->>Adapter: events (session, input_audio_buffer.commit, transcript delta/complete, errors)
Adapter->>Processor: parse events -> build_transcript_response
Processor-->>Adapter: StreamResponse objects
Adapter->>Client: deliver StreamResponse (transcript, timing, confidence)
Client->>OpenAI_WS: send finalize (input_audio_buffer.commit)
OpenAI_WS-->>Adapter: final events / commit ack
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
40-40: Consider graceful error handling instead of panicking on invalid URL.Using
expect()will cause a panic ifapi_baseis malformed. Other URL construction paths in the codebase useparse().ok()?patterns. Consider returning anOptionorResultto handle invalid input gracefully.- let parsed: url::Url = api_base.parse().expect("invalid_api_base"); + let parsed: url::Url = match api_base.parse() { + Ok(url) => url, + Err(_) => { + let model = model.unwrap_or("gpt-4o-transcribe"); + return ( + format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH) + .parse() + .expect("invalid_default_ws_url"), + vec![("model".to_string(), model.to_string())], + ); + } + };owhisper/owhisper-client/src/adapter/openai/live.rs (1)
290-295: Consider documenting lack of word-level timing.The words are built without timing information, so
calculate_time_spanwill likely return zeros. If OpenAI's API doesn't provide word-level timing, a brief comment here would clarify this is intentional.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
owhisper/owhisper-client/src/adapter/mod.rs(4 hunks)owhisper/owhisper-client/src/adapter/openai/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/openai/mod.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
owhisper/owhisper-client/src/adapter/mod.rs (5)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
is_host(14-16)owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
is_host(16-18)owhisper/owhisper-client/src/adapter/assemblyai/mod.rs (1)
is_host(12-14)owhisper/owhisper-client/src/adapter/soniox/mod.rs (1)
is_host(15-17)owhisper/owhisper-client/src/adapter/fireworks/mod.rs (1)
is_host(15-17)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
owhisper/owhisper-client/src/adapter/mod.rs (4)
host_matches(114-119)build_proxy_ws_url(129-150)extract_query_params(96-100)set_scheme_from_host(82-90)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/mod.rs (2)
7-7: LGTM!The module declaration and public re-export follow the established pattern used by other adapters (argmax, assemblyai, deepgram, etc.).
Also applies to: 20-20
169-169: LGTM!The
OpenAIvariant and host detection logic are properly integrated into the adapter selection flow, following the existing pattern.Also applies to: 188-189
owhisper/owhisper-client/src/adapter/openai/mod.rs (2)
6-56: LGTM!The adapter structure and URL building logic are well-implemented. The
is_openai_hostpattern is consistent with other adapters, and the URL construction properly handles proxy URLs, localhost, and direct OpenAI connections.
59-106: Good test coverage.The unit tests effectively cover the main URL construction scenarios including empty base, explicit model, proxy routing, and localhost handling.
owhisper/owhisper-client/src/adapter/openai/live.rs (5)
10-35: LGTM!The provider identification, URL building, and authorization header implementation are correct and follow established patterns.
82-87: LGTM!The finalization message correctly signals the end of audio input with the commit event.
89-165: LGTM!The response parsing is well-structured with appropriate logging levels and graceful handling of unknown events via the
#[serde(other)]fallback.
233-266: LGTM!The
OpenAIEventenum with tagged union deserialization and#[serde(other)]fallback provides robust event parsing with forward compatibility for new event types.
319-358: LGTM!Integration tests are properly marked as
#[ignore]since they require an API key and external service access. They serve as good documentation of adapter usage.
Co-Authored-By: yujonglee <[email protected]>
…rdcoded value - Add DEFAULT_MODEL constant to avoid drift between URL and session config - Use params.model with fallback to DEFAULT_MODEL in initial_message - Ensures WebSocket URL model and TranscriptionConfig model stay consistent Co-Authored-By: yujonglee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
22-56: Consider returning Result for more robust error handling.The function uses
expect()on lines 40 and 50, which will panic ifapi_basecontains an invalid URL. Consider returningResult<(url::Url, Vec<(String, String)>), Error>to handle invalid input gracefully and allow callers to decide how to handle errors.Example refactor:
pub(crate) fn build_ws_url_from_base( api_base: &str, model: Option<&str>, ) -> Result<(url::Url, Vec<(String, String)>), url::ParseError> { if api_base.is_empty() { let model = model.unwrap_or("gpt-4o-transcribe"); return Ok(( format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH) .parse() .expect("invalid_default_ws_url"), vec![("model".to_string(), model.to_string())], )); } if let Some(proxy_result) = super::build_proxy_ws_url(api_base) { return Ok(proxy_result); } let parsed: url::Url = api_base.parse()?; let mut existing_params = super::extract_query_params(&parsed); if !existing_params.iter().any(|(k, _)| k == "model") { let model = model.unwrap_or("gpt-4o-transcribe"); existing_params.push(("model".to_string(), model.to_string())); } let host = parsed.host_str().unwrap_or(DEFAULT_WS_HOST); let mut url: url::Url = format!("wss://{}{}", host, WS_PATH).parse()?; super::set_scheme_from_host(&mut url); Ok((url, existing_params)) }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (2)
owhisper/owhisper-client/src/adapter/mod.rs (4)
host_matches(114-119)build_proxy_ws_url(129-150)extract_query_params(96-100)set_scheme_from_host(82-90)crates/pyannote-cloud/src/test_key.rs (1)
test(12-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
1-4: LGTM: Module structure and constants look good.The module declaration and constants are correctly defined. The WebSocket path
/v1/realtimeand hostapi.openai.comalign with OpenAI's Realtime API endpoint structure.
feat(owhisper): add OpenAI Realtime API adapter (WIP)
Summary
Adds an OpenAI adapter for real-time speech-to-text transcription using the OpenAI Realtime API. The adapter implements the
RealtimeSttAdaptertrait following the existing patterns from Deepgram, AssemblyAI, and Soniox adapters.Changes:
OpenAIAdapterstruct inowhisper/owhisper-client/src/adapter/openai/wss://api.openai.com/v1/realtimeOpenAIvariant toAdapterKindenum with host detectionReview & Testing Checklist for Human
realtimevstranscription) with different schemas. The current implementation usestranscriptiontype with nestedaudio.input.transcriptionconfig, but this was rejected when usinggpt-4o-transcribemodel ("not supported in realtime mode")OPENAI_API_KEY="..." cargo test -p owhisper-client openai::live::tests::test_build_single --no-default-features -- --ignored --nocaptureOpenAI-Beta: realtime=v1header which the current trait doesn't supportRecommended test plan:
Notes
dprint checkwas timing out during developmentLink to Devin run: https://app.devin.ai/sessions/7339864147664278bde01e14973cef04
Requested by: yujonglee (@yujonglee)