Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 4, 2025

feat(owhisper): add OpenAI Realtime API adapter (WIP)

Summary

Adds an OpenAI adapter for real-time speech-to-text transcription using the OpenAI Realtime API. The adapter implements the RealtimeSttAdapter trait following the existing patterns from Deepgram, AssemblyAI, and Soniox adapters.

Changes:

  • New OpenAIAdapter struct in owhisper/owhisper-client/src/adapter/openai/
  • WebSocket connection to wss://api.openai.com/v1/realtime
  • Session configuration for transcription mode with server-side VAD
  • Event parsing for transcription completed, delta, and failed events
  • Added OpenAI variant to AdapterKind enum with host detection

Review & Testing Checklist for Human

⚠️ This PR is NOT verified to work with the actual OpenAI API. During development, I encountered API schema mismatches that were not resolved:

  • Verify the session configuration schema - The OpenAI Realtime API has two session types (realtime vs transcription) with different schemas. The current implementation uses transcription type with nested audio.input.transcription config, but this was rejected when using gpt-4o-transcribe model ("not supported in realtime mode")
  • Determine correct model/endpoint combination - Need to clarify whether transcription-only use cases require a different endpoint or model
  • Test with real OpenAI API key - Run OPENAI_API_KEY="..." cargo test -p owhisper-client openai::live::tests::test_build_single --no-default-features -- --ignored --nocapture
  • Consider if additional headers are needed - Reference implementations use OpenAI-Beta: realtime=v1 header which the current trait doesn't support
  • Review event type names - Verify the serde rename strings match OpenAI's actual event types

Recommended test plan:

  1. Review OpenAI Realtime API docs to confirm correct session configuration
  2. Fix the session config schema based on docs
  3. Run integration test with API key to verify end-to-end

Notes

  • The URL building unit tests pass, but integration tests fail due to API configuration issues
  • The adapter follows existing patterns closely (modeled after Soniox/AssemblyAI adapters)
  • dprint check was timing out during development

Link to Devin run: https://app.devin.ai/sessions/7339864147664278bde01e14973cef04
Requested by: yujonglee (@yujonglee)

Add OpenAI adapter for real-time speech-to-text transcription using
the OpenAI Realtime API. The adapter implements the RealtimeSttAdapter
trait and supports:

- WebSocket connection to wss://api.openai.com/v1/realtime
- Session configuration for transcription mode
- Parsing of transcription events (completed, delta, failed)
- Server-side VAD for turn detection

Note: The API configuration is still being finalized as there are
two session types (realtime vs transcription) with different schemas.

Co-Authored-By: yujonglee <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit e96ada0
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/693221332c0c2600083f495d
😎 Deploy Preview https://deploy-preview-2126--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit e96ada0
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/693221333f3cf700083e1b4b
😎 Deploy Preview https://deploy-preview-2126--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

Adds a new OpenAI realtime STT adapter: registers the module and enum variant, detects OpenAI hosts, builds WebSocket URLs and auth, implements WS lifecycle (initialization, finalize, parsing) and converts OpenAI realtime events into the client's transcript StreamResponse formats.

Changes

Cohort / File(s) Summary
Adapter integration
owhisper/owhisper-client/src/adapter/mod.rs
Added mod openai; and pub use openai::*;, extended AdapterKind with OpenAI, and updated from_url_and_languages to detect OpenAI hosts and map them to AdapterKind::OpenAI before other adapters.
OpenAI adapter core
owhisper/owhisper-client/src/adapter/openai/mod.rs
New OpenAIAdapter with host detection (is_host/is_openai_host), always-true language support predicate, constants (DEFAULT_WS_HOST, WS_PATH, DEFAULT_MODEL), and build_ws_url_from_base that constructs WS URLs, preserves/merges query params, handles proxy/base variants, and ensures model/query defaults. Includes unit tests for URL/host scenarios.
OpenAI live WebSocket implementation
owhisper/owhisper-client/src/adapter/openai/live.rs
Implements a RealtimeSttAdapter for OpenAI: WS URL construction, auth header generation, initial session.update message, finalize input_audio_buffer.commit, no keep-alive, OpenAIEvent deserialization, parsing into transcript StreamResponses, and build_transcript_response for per-word timing/confidence. Test scaffolding included.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant OpenAI_WS as "OpenAI Realtime WS"
  participant Adapter as "OpenAIAdapter"
  participant Processor as "Transcript Builder"

  Client->>Adapter: create connection (api_base, model, token)
  Adapter->>OpenAI_WS: open wss URL + Authorization header
  Adapter->>OpenAI_WS: send initial session.update (JSON)
  Client->>OpenAI_WS: stream audio / send audio frames
  OpenAI_WS-->>Adapter: events (session, input_audio_buffer.commit, transcript delta/complete, errors)
  Adapter->>Processor: parse events -> build_transcript_response
  Processor-->>Adapter: StreamResponse objects
  Adapter->>Client: deliver StreamResponse (transcript, timing, confidence)
  Client->>OpenAI_WS: send finalize (input_audio_buffer.commit)
  OpenAI_WS-->>Adapter: final events / commit ack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to OpenAI event enum deserialization, unknown-event fallback, and matching of event shapes.
  • Review timing and confidence calculations in build_transcript_response (WordBuilder and calculate_time_span usage).
  • Validate build_ws_url_from_base behavior for proxy scenarios, query-param preservation, and scheme conversion.
  • Inspect auth header construction for correctness and any potential secrets handling.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding an OpenAI Realtime API adapter. It is concise, directly related to the changeset, and follows conventional commit format.
Description check ✅ Passed The description provides detailed context about the OpenAI adapter implementation, including the summary, changes made, review checklist, and known issues. It is comprehensive and directly related to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764892376-openai-realtime-adapter

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d85ff4f and e96ada0.

📒 Files selected for processing (2)
  • owhisper/owhisper-client/src/adapter/openai/live.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs
  • owhisper/owhisper-client/src/adapter/openai/live.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Devin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

40-40: Consider graceful error handling instead of panicking on invalid URL.

Using expect() will cause a panic if api_base is malformed. Other URL construction paths in the codebase use parse().ok()? patterns. Consider returning an Option or Result to handle invalid input gracefully.

-        let parsed: url::Url = api_base.parse().expect("invalid_api_base");
+        let parsed: url::Url = match api_base.parse() {
+            Ok(url) => url,
+            Err(_) => {
+                let model = model.unwrap_or("gpt-4o-transcribe");
+                return (
+                    format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH)
+                        .parse()
+                        .expect("invalid_default_ws_url"),
+                    vec![("model".to_string(), model.to_string())],
+                );
+            }
+        };
owhisper/owhisper-client/src/adapter/openai/live.rs (1)

290-295: Consider documenting lack of word-level timing.

The words are built without timing information, so calculate_time_span will likely return zeros. If OpenAI's API doesn't provide word-level timing, a brief comment here would clarify this is intentional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 301ae71 and fc485df.

📒 Files selected for processing (3)
  • owhisper/owhisper-client/src/adapter/mod.rs (4 hunks)
  • owhisper/owhisper-client/src/adapter/openai/live.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
owhisper/owhisper-client/src/adapter/mod.rs (5)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
  • is_host (14-16)
owhisper/owhisper-client/src/adapter/gladia/mod.rs (1)
  • is_host (16-18)
owhisper/owhisper-client/src/adapter/assemblyai/mod.rs (1)
  • is_host (12-14)
owhisper/owhisper-client/src/adapter/soniox/mod.rs (1)
  • is_host (15-17)
owhisper/owhisper-client/src/adapter/fireworks/mod.rs (1)
  • is_host (15-17)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
owhisper/owhisper-client/src/adapter/mod.rs (4)
  • host_matches (114-119)
  • build_proxy_ws_url (129-150)
  • extract_query_params (96-100)
  • set_scheme_from_host (82-90)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - hyprnote-storybook
  • GitHub Check: Header rules - hyprnote-storybook
  • GitHub Check: Pages changed - hyprnote-storybook
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/mod.rs (2)

7-7: LGTM!

The module declaration and public re-export follow the established pattern used by other adapters (argmax, assemblyai, deepgram, etc.).

Also applies to: 20-20


169-169: LGTM!

The OpenAI variant and host detection logic are properly integrated into the adapter selection flow, following the existing pattern.

Also applies to: 188-189

owhisper/owhisper-client/src/adapter/openai/mod.rs (2)

6-56: LGTM!

The adapter structure and URL building logic are well-implemented. The is_openai_host pattern is consistent with other adapters, and the URL construction properly handles proxy URLs, localhost, and direct OpenAI connections.


59-106: Good test coverage.

The unit tests effectively cover the main URL construction scenarios including empty base, explicit model, proxy routing, and localhost handling.

owhisper/owhisper-client/src/adapter/openai/live.rs (5)

10-35: LGTM!

The provider identification, URL building, and authorization header implementation are correct and follow established patterns.


82-87: LGTM!

The finalization message correctly signals the end of audio input with the commit event.


89-165: LGTM!

The response parsing is well-structured with appropriate logging levels and graceful handling of unknown events via the #[serde(other)] fallback.


233-266: LGTM!

The OpenAIEvent enum with tagged union deserialization and #[serde(other)] fallback provides robust event parsing with forward compatibility for new event types.


319-358: LGTM!

Integration tests are properly marked as #[ignore] since they require an API key and external service access. They serve as good documentation of adapter usage.

devin-ai-integration bot and others added 2 commits December 4, 2025 23:59
…rdcoded value

- Add DEFAULT_MODEL constant to avoid drift between URL and session config
- Use params.model with fallback to DEFAULT_MODEL in initial_message
- Ensures WebSocket URL model and TranscriptionConfig model stay consistent

Co-Authored-By: yujonglee <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

22-56: Consider returning Result for more robust error handling.

The function uses expect() on lines 40 and 50, which will panic if api_base contains an invalid URL. Consider returning Result<(url::Url, Vec<(String, String)>), Error> to handle invalid input gracefully and allow callers to decide how to handle errors.

Example refactor:

pub(crate) fn build_ws_url_from_base(
    api_base: &str,
    model: Option<&str>,
) -> Result<(url::Url, Vec<(String, String)>), url::ParseError> {
    if api_base.is_empty() {
        let model = model.unwrap_or("gpt-4o-transcribe");
        return Ok((
            format!("wss://{}{}", DEFAULT_WS_HOST, WS_PATH)
                .parse()
                .expect("invalid_default_ws_url"),
            vec![("model".to_string(), model.to_string())],
        ));
    }

    if let Some(proxy_result) = super::build_proxy_ws_url(api_base) {
        return Ok(proxy_result);
    }

    let parsed: url::Url = api_base.parse()?;
    let mut existing_params = super::extract_query_params(&parsed);

    if !existing_params.iter().any(|(k, _)| k == "model") {
        let model = model.unwrap_or("gpt-4o-transcribe");
        existing_params.push(("model".to_string(), model.to_string()));
    }

    let host = parsed.host_str().unwrap_or(DEFAULT_WS_HOST);
    let mut url: url::Url = format!("wss://{}{}", host, WS_PATH).parse()?;

    super::set_scheme_from_host(&mut url);

    Ok((url, existing_params))
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc485df and d85ff4f.

📒 Files selected for processing (1)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (2)
owhisper/owhisper-client/src/adapter/mod.rs (4)
  • host_matches (114-119)
  • build_proxy_ws_url (129-150)
  • extract_query_params (96-100)
  • set_scheme_from_host (82-90)
crates/pyannote-cloud/src/test_key.rs (1)
  • test (12-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (1)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

1-4: LGTM: Module structure and constants look good.

The module declaration and constants are correctly defined. The WebSocket path /v1/realtime and host api.openai.com align with OpenAI's Realtime API endpoint structure.

@yujonglee yujonglee merged commit 1bd6144 into main Dec 5, 2025
12 of 13 checks passed
@yujonglee yujonglee deleted the devin/1764892376-openai-realtime-adapter branch December 5, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants