feat(owhisper): add OpenAI Realtime API adapter #2126

yujonglee · 2025-12-04T23:54:09Z

feat(owhisper): add OpenAI Realtime API adapter (WIP)

Summary

Adds an OpenAI adapter for real-time speech-to-text transcription using the OpenAI Realtime API. The adapter implements the RealtimeSttAdapter trait following the existing patterns from Deepgram, AssemblyAI, and Soniox adapters.

Changes:

New OpenAIAdapter struct in owhisper/owhisper-client/src/adapter/openai/
WebSocket connection to wss://api.openai.com/v1/realtime
Session configuration for transcription mode with server-side VAD
Event parsing for transcription completed, delta, and failed events
Added OpenAI variant to AdapterKind enum with host detection

Review & Testing Checklist for Human

⚠️ This PR is NOT verified to work with the actual OpenAI API. During development, I encountered API schema mismatches that were not resolved:

Verify the session configuration schema - The OpenAI Realtime API has two session types (realtime vs transcription) with different schemas. The current implementation uses transcription type with nested audio.input.transcription config, but this was rejected when using gpt-4o-transcribe model ("not supported in realtime mode")
Determine correct model/endpoint combination - Need to clarify whether transcription-only use cases require a different endpoint or model
Test with real OpenAI API key - Run OPENAI_API_KEY="..." cargo test -p owhisper-client openai::live::tests::test_build_single --no-default-features -- --ignored --nocapture
Consider if additional headers are needed - Reference implementations use OpenAI-Beta: realtime=v1 header which the current trait doesn't support
Review event type names - Verify the serde rename strings match OpenAI's actual event types

Recommended test plan:

Review OpenAI Realtime API docs to confirm correct session configuration
Fix the session config schema based on docs
Run integration test with API key to verify end-to-end

Notes

The URL building unit tests pass, but integration tests fail due to API configuration issues
The adapter follows existing patterns closely (modeled after Soniox/AssemblyAI adapters)
dprint check was timing out during development

Link to Devin run: https://app.devin.ai/sessions/7339864147664278bde01e14973cef04
Requested by: yujonglee (@yujonglee)

Add OpenAI adapter for real-time speech-to-text transcription using the OpenAI Realtime API. The adapter implements the RealtimeSttAdapter trait and supports: - WebSocket connection to wss://api.openai.com/v1/realtime - Session configuration for transcription mode - Parsing of transcription events (completed, delta, failed) - Server-side VAD for turn detection Note: The API configuration is still being finalized as there are two session types (realtime vs transcription) with different schemas. Co-Authored-By: yujonglee <[email protected]>

devin-ai-integration · 2025-12-04T23:54:11Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

netlify · 2025-12-04T23:54:14Z

✅ Deploy Preview for hyprnote ready!

Name	Link
🔨 Latest commit	`e96ada0`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote/deploys/693221332c0c2600083f495d
😎 Deploy Preview	https://deploy-preview-2126--hyprnote.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2025-12-04T23:54:15Z

✅ Deploy Preview for hyprnote-storybook ready!

Name	Link
🔨 Latest commit	`e96ada0`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote-storybook/deploys/693221333f3cf700083e1b4b
😎 Deploy Preview	https://deploy-preview-2126--hyprnote-storybook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-12-04T23:54:31Z

📝 Walkthrough

Walkthrough

Adds a new OpenAI realtime STT adapter: registers the module and enum variant, detects OpenAI hosts, builds WebSocket URLs and auth, implements WS lifecycle (initialization, finalize, parsing) and converts OpenAI realtime events into the client's transcript StreamResponse formats.

Changes

Cohort / File(s)	Summary
Adapter integration `owhisper/owhisper-client/src/adapter/mod.rs`	Added `mod openai;` and `pub use openai::*;`, extended `AdapterKind` with `OpenAI`, and updated `from_url_and_languages` to detect OpenAI hosts and map them to `AdapterKind::OpenAI` before other adapters.
OpenAI adapter core `owhisper/owhisper-client/src/adapter/openai/mod.rs`	New `OpenAIAdapter` with host detection (`is_host`/`is_openai_host`), always-true language support predicate, constants (`DEFAULT_WS_HOST`, `WS_PATH`, `DEFAULT_MODEL`), and `build_ws_url_from_base` that constructs WS URLs, preserves/merges query params, handles proxy/base variants, and ensures model/query defaults. Includes unit tests for URL/host scenarios.
OpenAI live WebSocket implementation `owhisper/owhisper-client/src/adapter/openai/live.rs`	Implements a `RealtimeSttAdapter` for OpenAI: WS URL construction, auth header generation, initial `session.update` message, finalize `input_audio_buffer.commit`, no keep-alive, `OpenAIEvent` deserialization, parsing into transcript `StreamResponse`s, and `build_transcript_response` for per-word timing/confidence. Test scaffolding included.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant OpenAI_WS as "OpenAI Realtime WS"
  participant Adapter as "OpenAIAdapter"
  participant Processor as "Transcript Builder"

  Client->>Adapter: create connection (api_base, model, token)
  Adapter->>OpenAI_WS: open wss URL + Authorization header
  Adapter->>OpenAI_WS: send initial session.update (JSON)
  Client->>OpenAI_WS: stream audio / send audio frames
  OpenAI_WS-->>Adapter: events (session, input_audio_buffer.commit, transcript delta/complete, errors)
  Adapter->>Processor: parse events -> build_transcript_response
  Processor-->>Adapter: StreamResponse objects
  Adapter->>Client: deliver StreamResponse (transcript, timing, confidence)
  Client->>OpenAI_WS: send finalize (input_audio_buffer.commit)
  OpenAI_WS-->>Adapter: final events / commit ack

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay special attention to OpenAI event enum deserialization, unknown-event fallback, and matching of event shapes.
Review timing and confidence calculations in build_transcript_response (WordBuilder and calculate_time_span usage).
Validate build_ws_url_from_base behavior for proxy scenarios, query-param preservation, and scheme conversion.
Inspect auth header construction for correctness and any potential secrets handling.

Possibly related PRs

feat(owhisper-client): add response parsing utilities #2111 — related to transcript construction utilities (WordBuilder, calculate_time_span) used by the OpenAI adapter.
Refactor provider specific split connection through adaptor #2052 — related to Realtime STT adapter patterns and initial-message / WebSocket integration used by this adapter.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding an OpenAI Realtime API adapter. It is concise, directly related to the changeset, and follows conventional commit format.
Description check	✅ Passed	The description provides detailed context about the OpenAI adapter implementation, including the summary, changes made, review checklist, and known issues. It is comprehensive and directly related to the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1764892376-openai-realtime-adapter

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d85ff4f and e96ada0.

📒 Files selected for processing (2)

owhisper/owhisper-client/src/adapter/openai/live.rs (1 hunks)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

owhisper/owhisper-client/src/adapter/openai/mod.rs
owhisper/owhisper-client/src/adapter/openai/live.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Devin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
40-40: Consider graceful error handling instead of panicking on invalid URL.

Using expect() will cause a panic if api_base is malformed. Other URL construction paths in the codebase use parse().ok()? patterns. Consider returning an Option or Result to handle invalid input gracefully.
-        let parsed: url::Url = api_base.parse().expect("invalid_api_base");
+        let parsed: url::Url = match api_base.parse() {
+            Ok(url) => url,
+            Err(_) => {
+                let model = model.unwrap_or("gpt-4o-transcribe");
+                return (
+                    format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH)
+                        .parse()
+                        .expect("invalid_default_ws_url"),
+                    vec![("model".to_string(), model.to_string())],
+                );
+            }
+        };
owhisper/owhisper-client/src/adapter/openai/live.rs (1)

290-295: Consider documenting lack of word-level timing.

The words are built without timing information, so calculate_time_span will likely return zeros. If OpenAI's API doesn't provide word-level timing, a brief comment here would clarify this is intentional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 301ae71 and fc485df.

📒 Files selected for processing (3)

owhisper/owhisper-client/src/adapter/mod.rs (4 hunks)
owhisper/owhisper-client/src/adapter/openai/live.rs (1 hunks)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

owhisper/owhisper-client/src/adapter/mod.rs (5)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

is_host (14-16)

owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)

is_host (16-18)

owhisper/owhisper-client/src/adapter/assemblyai/mod.rs (1)

is_host (12-14)

owhisper/owhisper-client/src/adapter/soniox/mod.rs (1)

is_host (15-17)

owhisper/owhisper-client/src/adapter/fireworks/mod.rs (1)

is_host (15-17)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

owhisper/owhisper-client/src/adapter/mod.rs (4)

host_matches (114-119)

build_proxy_ws_url (129-150)

extract_query_params (96-100)

set_scheme_from_host (82-90)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Redirect rules - hyprnote-storybook
GitHub Check: Header rules - hyprnote-storybook
GitHub Check: Pages changed - hyprnote-storybook
GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: fmt
GitHub Check: Devin

🔇 Additional comments (9)

owhisper/owhisper-client/src/adapter/mod.rs (2)

7-7: LGTM!

The module declaration and public re-export follow the established pattern used by other adapters (argmax, assemblyai, deepgram, etc.).

Also applies to: 20-20

169-169: LGTM!

The OpenAI variant and host detection logic are properly integrated into the adapter selection flow, following the existing pattern.

Also applies to: 188-189

owhisper/owhisper-client/src/adapter/openai/mod.rs (2)

6-56: LGTM!

The adapter structure and URL building logic are well-implemented. The is_openai_host pattern is consistent with other adapters, and the URL construction properly handles proxy URLs, localhost, and direct OpenAI connections.

59-106: Good test coverage.

The unit tests effectively cover the main URL construction scenarios including empty base, explicit model, proxy routing, and localhost handling.

owhisper/owhisper-client/src/adapter/openai/live.rs (5)

10-35: LGTM!

The provider identification, URL building, and authorization header implementation are correct and follow established patterns.

82-87: LGTM!

The finalization message correctly signals the end of audio input with the commit event.

89-165: LGTM!

The response parsing is well-structured with appropriate logging levels and graceful handling of unknown events via the #[serde(other)] fallback.

233-266: LGTM!

The OpenAIEvent enum with tagged union deserialization and #[serde(other)] fallback provides robust event parsing with forward compatibility for new event types.

319-358: LGTM!

Integration tests are properly marked as #[ignore] since they require an API key and external service access. They serve as good documentation of adapter usage.

owhisper/owhisper-client/src/adapter/openai/live.rs

Co-Authored-By: yujonglee <[email protected]>

…rdcoded value - Add DEFAULT_MODEL constant to avoid drift between URL and session config - Use params.model with fallback to DEFAULT_MODEL in initial_message - Ensures WebSocket URL model and TranscriptionConfig model stay consistent Co-Authored-By: yujonglee <[email protected]>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

22-56: Consider returning Result for more robust error handling.

The function uses expect() on lines 40 and 50, which will panic if api_base contains an invalid URL. Consider returning Result<(url::Url, Vec<(String, String)>), Error> to handle invalid input gracefully and allow callers to decide how to handle errors.

Example refactor:

pub(crate) fn build_ws_url_from_base(
    api_base: &str,
    model: Option<&str>,
) -> Result<(url::Url, Vec<(String, String)>), url::ParseError> {
    if api_base.is_empty() {
        let model = model.unwrap_or("gpt-4o-transcribe");
        return Ok((
            format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH)
                .parse()
                .expect("invalid_default_ws_url"),
            vec![("model".to_string(), model.to_string())],
        ));
    }

    if let Some(proxy_result) = super::build_proxy_ws_url(api_base) {
        return Ok(proxy_result);
    }

    let parsed: url::Url = api_base.parse()?;
    let mut existing_params = super::extract_query_params(&parsed);

    if !existing_params.iter().any(|(k, _)| k == "model") {
        let model = model.unwrap_or("gpt-4o-transcribe");
        existing_params.push(("model".to_string(), model.to_string()));
    }

    let host = parsed.host_str().unwrap_or(DEFAULT_WS_HOST);
    let mut url: url::Url = format!("wss://{}{}", host, WS_PATH).parse()?;

    super::set_scheme_from_host(&mut url);

    Ok((url, existing_params))
}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc485df and d85ff4f.

📒 Files selected for processing (1)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

owhisper/owhisper-client/src/adapter/openai/mod.rs (2)

owhisper/owhisper-client/src/adapter/mod.rs (4)

host_matches (114-119)

build_proxy_ws_url (129-150)

extract_query_params (96-100)

set_scheme_from_host (82-90)

crates/pyannote-cloud/src/test_key.rs (1)

test (12-24)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: fmt
GitHub Check: Devin

🔇 Additional comments (1)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

1-4: LGTM: Module structure and constants look good.

The module declaration and constants are correctly defined. The WebSocket path /v1/realtime and host api.openai.com align with OpenAI's Realtime API endpoint structure.

owhisper/owhisper-client/src/adapter/openai/mod.rs

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

owhisper/owhisper-client/src/adapter/openai/live.rs Show resolved Hide resolved

devin-ai-integration bot and others added 2 commits December 4, 2025 23:59

style: apply dprint formatting to OpenAI adapter

d85ff4f

Co-Authored-By: yujonglee <[email protected]>

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

owhisper/owhisper-client/src/adapter/openai/mod.rs Show resolved Hide resolved

owhisper/owhisper-client/src/adapter/openai/mod.rs Show resolved Hide resolved

owhisper/owhisper-client/src/adapter/openai/mod.rs Show resolved Hide resolved

yujonglee merged commit 1bd6144 into main Dec 5, 2025
12 of 13 checks passed

yujonglee deleted the devin/1764892376-openai-realtime-adapter branch December 5, 2025 00:05

This was referenced Dec 5, 2025

Fix OpenAI Realtime API transcription test #2127

Merged

Refactor OpenAI adapter and add model support #2128

Merged

coderabbitai bot mentioned this pull request Dec 18, 2025

Add owhisper-providers and share it in client adapters and proxy server #2396

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(owhisper): add OpenAI Realtime API adapter #2126

feat(owhisper): add OpenAI Realtime API adapter #2126

Uh oh!

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(owhisper): add OpenAI Realtime API adapter #2126

feat(owhisper): add OpenAI Realtime API adapter #2126

Uh oh!

Conversation

yujonglee commented Dec 4, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(owhisper): add OpenAI Realtime API adapter (WIP)

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

🤖 Devin AI Engineer

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote ready!

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote-storybook ready!

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading