Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 4, 2025

feat(owhisper): add OpenAI batch transcription adapter

Summary

Implements BatchSttAdapter for OpenAI's speech-to-text API (Whisper), enabling batch transcription of audio files via the /v1/audio/transcriptions endpoint.

Key implementation details:

  • Uses verbose_json response format with word-level timestamps
  • Converts OpenAI's response format to the internal BatchResponse structure
  • Returns empty words Vec when word-level timestamps aren't available (rather than faking per-word data from segments)
  • Strips punctuation for word field, preserves original in punctuated_word
  • Supports common audio formats (mp3, wav, m4a, ogg, flac, webm)

Updates since last revision

Addressed review feedback:

  • Fixed fallback filename to use actual file extension instead of hardcoded .wav
  • word field now has punctuation stripped; punctuated_word contains the original token
  • Removed segment-to-word fallback; returns empty words Vec when word-level data unavailable
  • Kept confidence: 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI use unwrap_or(1.0))

Review & Testing Checklist for Human

  • Verify the timestamp_granularities[] parameter format is correct for OpenAI's API (line 96)
  • Confirm empty words Vec is acceptable for downstream consumers when OpenAI doesn't return word-level timestamps
  • Review strip_punctuation only handles ASCII punctuation - may miss Unicode punctuation marks
  • Note: confidence is hardcoded to 1.0 and speaker is always None since OpenAI doesn't provide these

Recommended test plan: Run the ignored test with a valid OpenAI API key:

OPENAI_API_KEY="your-key" cargo test -p owhisper-client test_openai_transcribe -- --ignored --nocapture

Notes

Implement BatchSttAdapter for OpenAI's speech-to-text API:
- Add openai/mod.rs with OpenAIAdapter struct
- Add openai/batch.rs with transcribe_file implementation
- Support verbose_json response format with word-level timestamps
- Convert OpenAI response format to BatchResponse
- Include ignored test for manual verification with API key

Co-Authored-By: yujonglee <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 7327eb3
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/693222e96cbc870008198feb
😎 Deploy Preview https://deploy-preview-2125--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 7327eb3
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/693222e981b6d2000800c29c
😎 Deploy Preview https://deploy-preview-2125--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6af1c0a and 7327eb3.

📒 Files selected for processing (1)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
📝 Walkthrough

Walkthrough

Adds a new OpenAI adapter to the owhisper client: a public OpenAIAdapter type and a batch transcription implementation that reads audio files, builds a multipart request to OpenAI's /audio/transcriptions endpoint, authenticates with a Bearer token, parses verbose JSON responses, and converts them into the client's BatchResponse shape.

Changes

Cohort / File(s) Summary
Adapter module export
owhisper/owhisper-client/src/adapter/mod.rs
Adds mod openai; and pub use openai::*; to expose the OpenAI adapter from the adapter module
OpenAI adapter entry
owhisper/owhisper-client/src/adapter/openai/mod.rs
Introduces pub struct OpenAIAdapter (derives Clone, Default) and declares mod batch;
OpenAI batch transcription implementation
owhisper/owhisper-client/src/adapter/openai/batch.rs
Implements BatchSttAdapter::transcribe_file (async boxed future) with: file reading and filename resolution, MIME type mapping, multipart Form construction (model/response_format/timestamp_granularity/language), POST to ${api_base}/audio/transcriptions with Bearer auth, handling non-2xx responses, deserializing OpenAIVerboseResponse (words/segments), converting into internal Words/Alternatives/Channel/BatchResponse, and test placeholder

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant OpenAIAdapter
    participant FileSystem
    participant OpenAIAPI as "OpenAI API"
    participant Converter as "Response Converter"

    Caller->>OpenAIAdapter: transcribe_file(params, file_path, api_key, client, api_base)
    OpenAIAdapter->>FileSystem: async read file bytes & infer filename
    FileSystem-->>OpenAIAdapter: file bytes + filename
    OpenAIAdapter->>OpenAIAdapter: build multipart Form (file, model, response_format, timestamps, language?)
    OpenAIAdapter->>OpenAIAPI: POST /audio/transcriptions (multipart + Bearer)
    alt 2xx Success
        OpenAIAPI-->>OpenAIAdapter: verbose JSON
        OpenAIAdapter->>Converter: deserialize OpenAIVerboseResponse
        Converter-->>OpenAIAdapter: BatchResponse (words, timings, alternatives, metadata)
        OpenAIAdapter-->>Caller: BatchResponse
    else non-2xx
        OpenAIAPI-->>OpenAIAdapter: status + body
        OpenAIAdapter-->>Caller: Error::UnexpectedStatus(status, body)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review async lifetime and BatchFuture correctness in the trait implementation.
  • Validate multipart form fields and MIME mapping logic (mime_type_from_extension) for audio edge cases.
  • Inspect OpenAIVerboseResponse deserialization and conversion to internal Word/Segment structures (punctuation stripping, timestamps, confidence handling).
  • Check error mapping for file IO, MIME resolution, and non-2xx HTTP responses.

Possibly related PRs

  • Refactor STT adapters #2059 — Adds BatchSttAdapter trait and refactors adapters to that interface; closely related as this PR implements an OpenAI-backed BatchSttAdapter.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding an OpenAI batch transcription adapter to the owhisper project.
Description check ✅ Passed The description comprehensively explains the feature implementation, key details, updates, and includes testing guidance directly related to the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

123-134: Consider using the mime_guess crate for more comprehensive MIME type detection.

The current implementation covers common audio formats adequately, but a dedicated crate would handle edge cases and additional formats automatically.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 301ae71 and 42a0b61.

📒 Files selected for processing (3)
  • owhisper/owhisper-client/src/adapter/mod.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/openai/batch.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin
  • GitHub Check: fmt
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/mod.rs (1)

7-7: LGTM! OpenAI adapter properly integrated.

The module declaration and re-export follow the established pattern used by other adapters in this file.

Also applies to: 20-20

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

1-4: LGTM! Clean adapter entry point.

The structure properly exposes the OpenAI adapter with appropriate derives and module organization.

owhisper/owhisper-client/src/adapter/openai/batch.rs (7)

1-14: LGTM! Clean imports and sensible defaults.

The imports cover all necessary dependencies, and the default values match OpenAI's API requirements.


15-27: LGTM! Trait implementation follows standard pattern.

The delegation to do_transcribe_file with proper future boxing is appropriate.


29-59: LGTM! Response models properly structured.

The data structures appropriately model OpenAI's verbose_json response format with proper serde attributes.


104-120: LGTM! Request handling is robust.

The Bearer token authentication, multipart form submission, and error handling for non-2xx responses are all properly implemented.


165-185: LGTM! Response structure assembly is clean.

The conversion to Alternatives, Channel, and BatchResponse with metadata properly packages the transcription results.


187-220: LGTM! Test structure is appropriate for integration testing.

The test is properly marked with #[ignore] for external API calls, validates response structure, and uses environment variables for credentials.


87-95: No issues found. The code correctly implements OpenAI's audio transcription API parameters:

  • timestamp_granularities[] with value "word" is the correct array notation for multipart forms
  • lang.iso639().code() returns ISO 639-1 two-letter codes (e.g., "en", "es", "zh") as required by OpenAI's language parameter

Both formats match OpenAI's API expectations.

- Fix fallback filename to use actual file extension instead of hardcoded .wav
- Strip punctuation for word field, keep original for punctuated_word
- Return empty words Vec when word-level timestamps unavailable (instead of multi-word segment entries)
- Keep confidence at 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI)

Co-Authored-By: yujonglee <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

161-161: Hardcoded confidence of 1.0 remains unaddressed.

As flagged in previous reviews, OpenAI doesn't provide confidence scores. A value of 1.0 suggests certainty that isn't actually present. Consider using a documented sentinel value or adding a comment.

Also applies to: 173-173

🧹 Nitpick comments (3)
owhisper/owhisper-client/src/adapter/openai/batch.rs (3)

128-139: Consider adding support for mpeg and mpga extensions.

OpenAI's API also supports mpeg and mpga file extensions (both audio/mpeg). While the fallback to application/octet-stream should work, explicit mapping would be more robust.

         Some("mp3") => "audio/mpeg",
+        Some("mpeg") => "audio/mpeg",
+        Some("mpga") => "audio/mpeg",
         Some("mp4") => "audio/mp4",

141-144: Minor: strip_punctuation only handles ASCII punctuation.

For non-English transcriptions, OpenAI may return Unicode punctuation characters (e.g., , , ). The current ASCII-only approach should work for most cases but may leave punctuation attached to words in some languages.


193-226: Consider adding unit tests for helper functions.

The integration test is appropriately ignored for CI, but the helper functions (mime_type_from_extension, strip_punctuation, convert_response) would benefit from unit tests that don't require an API key.

Example test cases:

#[test]
fn test_strip_punctuation() {
    assert_eq!(strip_punctuation("hello,"), "hello");
    assert_eq!(strip_punctuation("..."), "...");  // all punctuation preserved
    assert_eq!(strip_punctuation(""), "");
}

#[test]
fn test_mime_type_from_extension() {
    assert_eq!(mime_type_from_extension(Path::new("audio.mp3")), "audio/mpeg");
    assert_eq!(mime_type_from_extension(Path::new("audio.unknown")), "application/octet-stream");
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a0b61 and 6af1c0a.

📒 Files selected for processing (1)
  • owhisper/owhisper-client/src/adapter/openai/batch.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (2)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • transcribe_file (72-79)
owhisper/owhisper-interface/src/batch.rs (2)
  • alternatives (69-73)
  • channel (85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (6)
owhisper/owhisper-client/src/adapter/openai/batch.rs (6)

15-27: LGTM!

The trait implementation correctly adapts the path and delegates to the async do_transcribe_file function with proper lifetime handling.


68-88: LGTM!

File name fallback now correctly uses the file extension, and the MIME type determination is appropriate for common audio formats.


146-169: Good improvements to word processing logic.

The implementation now correctly:

  • Strips punctuation for word while preserving it in punctuated_word
  • Handles punctuation-only tokens gracefully (lines 154-158)
  • Returns an empty vector instead of misusing segments as words

171-191: LGTM!

Response structure correctly maps OpenAI's output to the internal BatchResponse format with appropriate metadata.


98-100: Language code format is correct.

The code uses codes_iso_639::part_1::LanguageCode, which is the ISO 639-1 standard providing 2-letter language codes (e.g., "en", "es", "fr"). This matches OpenAI's API requirements exactly. The schema validation regex confirms the 2-letter format. No changes needed.


29-59: The struct definitions are correct and align with OpenAI's audio transcription API documentation. No changes needed.

  • OpenAIWord fields (word: String, start: f64, end: f64) match the documented word object structure
  • OpenAISegment fields correctly include seek: i32 as the frame offset, start/end: f64 for timestamps in seconds
  • OpenAIVerboseResponse structure properly reflects the verbose_json response format
  • The #[allow(dead_code)] attributes and #[serde(default)] decorators are appropriate

@yujonglee yujonglee merged commit c914b4f into main Dec 5, 2025
12 of 13 checks passed
@yujonglee yujonglee deleted the devin/1764890226-openai-batch-adapter branch December 5, 2025 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants