-
Notifications
You must be signed in to change notification settings - Fork 467
feat(owhisper): add OpenAI batch transcription adapter #2125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement BatchSttAdapter for OpenAI's speech-to-text API: - Add openai/mod.rs with OpenAIAdapter struct - Add openai/batch.rs with transcribe_file implementation - Support verbose_json response format with word-level timestamps - Convert OpenAI response format to BatchResponse - Include ignored test for manual verification with API key Co-Authored-By: yujonglee <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Warning Rate limit exceeded@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 37 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new OpenAI adapter to the owhisper client: a public Changes
Sequence Diagram(s)sequenceDiagram
actor Caller
participant OpenAIAdapter
participant FileSystem
participant OpenAIAPI as "OpenAI API"
participant Converter as "Response Converter"
Caller->>OpenAIAdapter: transcribe_file(params, file_path, api_key, client, api_base)
OpenAIAdapter->>FileSystem: async read file bytes & infer filename
FileSystem-->>OpenAIAdapter: file bytes + filename
OpenAIAdapter->>OpenAIAdapter: build multipart Form (file, model, response_format, timestamps, language?)
OpenAIAdapter->>OpenAIAPI: POST /audio/transcriptions (multipart + Bearer)
alt 2xx Success
OpenAIAPI-->>OpenAIAdapter: verbose JSON
OpenAIAdapter->>Converter: deserialize OpenAIVerboseResponse
Converter-->>OpenAIAdapter: BatchResponse (words, timings, alternatives, metadata)
OpenAIAdapter-->>Caller: BatchResponse
else non-2xx
OpenAIAPI-->>OpenAIAdapter: status + body
OpenAIAdapter-->>Caller: Error::UnexpectedStatus(status, body)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
123-134: Consider using themime_guesscrate for more comprehensive MIME type detection.The current implementation covers common audio formats adequately, but a dedicated crate would handle edge cases and additional formats automatically.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
owhisper/owhisper-client/src/adapter/mod.rs(2 hunks)owhisper/owhisper-client/src/adapter/openai/batch.rs(1 hunks)owhisper/owhisper-client/src/adapter/openai/mod.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
- GitHub Check: fmt
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/mod.rs (1)
7-7: LGTM! OpenAI adapter properly integrated.The module declaration and re-export follow the established pattern used by other adapters in this file.
Also applies to: 20-20
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
1-4: LGTM! Clean adapter entry point.The structure properly exposes the OpenAI adapter with appropriate derives and module organization.
owhisper/owhisper-client/src/adapter/openai/batch.rs (7)
1-14: LGTM! Clean imports and sensible defaults.The imports cover all necessary dependencies, and the default values match OpenAI's API requirements.
15-27: LGTM! Trait implementation follows standard pattern.The delegation to
do_transcribe_filewith proper future boxing is appropriate.
29-59: LGTM! Response models properly structured.The data structures appropriately model OpenAI's verbose_json response format with proper serde attributes.
104-120: LGTM! Request handling is robust.The Bearer token authentication, multipart form submission, and error handling for non-2xx responses are all properly implemented.
165-185: LGTM! Response structure assembly is clean.The conversion to
Alternatives,Channel, andBatchResponsewith metadata properly packages the transcription results.
187-220: LGTM! Test structure is appropriate for integration testing.The test is properly marked with
#[ignore]for external API calls, validates response structure, and uses environment variables for credentials.
87-95: No issues found. The code correctly implements OpenAI's audio transcription API parameters:
timestamp_granularities[]with value"word"is the correct array notation for multipart formslang.iso639().code()returns ISO 639-1 two-letter codes (e.g., "en", "es", "zh") as required by OpenAI's language parameterBoth formats match OpenAI's API expectations.
- Fix fallback filename to use actual file extension instead of hardcoded .wav - Strip punctuation for word field, keep original for punctuated_word - Return empty words Vec when word-level timestamps unavailable (instead of multi-word segment entries) - Keep confidence at 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI) Co-Authored-By: yujonglee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
161-161: Hardcoded confidence of 1.0 remains unaddressed.As flagged in previous reviews, OpenAI doesn't provide confidence scores. A value of
1.0suggests certainty that isn't actually present. Consider using a documented sentinel value or adding a comment.Also applies to: 173-173
🧹 Nitpick comments (3)
owhisper/owhisper-client/src/adapter/openai/batch.rs (3)
128-139: Consider adding support formpegandmpgaextensions.OpenAI's API also supports
mpegandmpgafile extensions (bothaudio/mpeg). While the fallback toapplication/octet-streamshould work, explicit mapping would be more robust.Some("mp3") => "audio/mpeg", + Some("mpeg") => "audio/mpeg", + Some("mpga") => "audio/mpeg", Some("mp4") => "audio/mp4",
141-144: Minor:strip_punctuationonly handles ASCII punctuation.For non-English transcriptions, OpenAI may return Unicode punctuation characters (e.g.,
「,」,。). The current ASCII-only approach should work for most cases but may leave punctuation attached to words in some languages.
193-226: Consider adding unit tests for helper functions.The integration test is appropriately ignored for CI, but the helper functions (
mime_type_from_extension,strip_punctuation,convert_response) would benefit from unit tests that don't require an API key.Example test cases:
#[test] fn test_strip_punctuation() { assert_eq!(strip_punctuation("hello,"), "hello"); assert_eq!(strip_punctuation("..."), "..."); // all punctuation preserved assert_eq!(strip_punctuation(""), ""); } #[test] fn test_mime_type_from_extension() { assert_eq!(mime_type_from_extension(Path::new("audio.mp3")), "audio/mpeg"); assert_eq!(mime_type_from_extension(Path::new("audio.unknown")), "application/octet-stream"); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (2)
owhisper/owhisper-client/src/adapter/mod.rs (1)
transcribe_file(72-79)owhisper/owhisper-interface/src/batch.rs (2)
alternatives(69-73)channel(85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (6)
owhisper/owhisper-client/src/adapter/openai/batch.rs (6)
15-27: LGTM!The trait implementation correctly adapts the path and delegates to the async
do_transcribe_filefunction with proper lifetime handling.
68-88: LGTM!File name fallback now correctly uses the file extension, and the MIME type determination is appropriate for common audio formats.
146-169: Good improvements to word processing logic.The implementation now correctly:
- Strips punctuation for
wordwhile preserving it inpunctuated_word- Handles punctuation-only tokens gracefully (lines 154-158)
- Returns an empty vector instead of misusing segments as words
171-191: LGTM!Response structure correctly maps OpenAI's output to the internal
BatchResponseformat with appropriate metadata.
98-100: Language code format is correct.The code uses
codes_iso_639::part_1::LanguageCode, which is the ISO 639-1 standard providing 2-letter language codes (e.g., "en", "es", "fr"). This matches OpenAI's API requirements exactly. The schema validation regex confirms the 2-letter format. No changes needed.
29-59: The struct definitions are correct and align with OpenAI's audio transcription API documentation. No changes needed.
OpenAIWordfields (word: String,start: f64,end: f64) match the documented word object structureOpenAISegmentfields correctly includeseek: i32as the frame offset,start/end: f64for timestamps in secondsOpenAIVerboseResponsestructure properly reflects the verbose_json response format- The
#[allow(dead_code)]attributes and#[serde(default)]decorators are appropriate
Co-Authored-By: yujonglee <[email protected]>
feat(owhisper): add OpenAI batch transcription adapter
Summary
Implements
BatchSttAdapterfor OpenAI's speech-to-text API (Whisper), enabling batch transcription of audio files via the/v1/audio/transcriptionsendpoint.Key implementation details:
verbose_jsonresponse format with word-level timestampsBatchResponsestructurewordsVec when word-level timestamps aren't available (rather than faking per-word data from segments)wordfield, preserves original inpunctuated_wordUpdates since last revision
Addressed review feedback:
.wavwordfield now has punctuation stripped;punctuated_wordcontains the original tokenwordsVec when word-level data unavailableconfidence: 1.0consistent with other adapters (Soniox, Fireworks, AssemblyAI useunwrap_or(1.0))Review & Testing Checklist for Human
timestamp_granularities[]parameter format is correct for OpenAI's API (line 96)wordsVec is acceptable for downstream consumers when OpenAI doesn't return word-level timestampsstrip_punctuationonly handles ASCII punctuation - may miss Unicode punctuation marksconfidenceis hardcoded to 1.0 andspeakeris alwaysNonesince OpenAI doesn't provide theseRecommended test plan: Run the ignored test with a valid OpenAI API key:
Notes