-
Notifications
You must be signed in to change notification settings - Fork 467
Refactor OpenAI adapter and add model support #2128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Remove dead code (OpenAISegment struct, segments field) - Extract magic values to constants (VAD config, response formats) - Add model-aware request building for batch API - whisper-1: uses verbose_json with word timestamps - gpt-4o-transcribe/gpt-4o-mini-transcribe: uses json format - Update UI to show all OpenAI STT models - Add model documentation comments Co-Authored-By: yujonglee <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds support for two new OpenAI transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe) to the UI settings. Updates the OpenAI adapter to conditionally use word-level timestamps (whisper-1 only), centralizes Voice Activity Detection defaults, and sets a default transcription model constant. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
apps/desktop/src/components/settings/ai/stt/shared.tsx (1)
47-53: New OpenAI model mappings and provider configuration look consistent
displayModelIdnow cleanly handles the two new model IDs, and the OpenAI provider’smodelslist matches those IDs, so UI selection and labeling should work end‑to‑end with the new backend defaults. Only tiny nit: if you care about consistency with other labels, you might prefer"GPT-4o Mini Transcribe"(capital “Mini”), but that’s purely cosmetic.Also applies to: 164-164
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
7-12: Consider unifying batch and realtime default model configurationThe documented model list and
DEFAULT_TRANSCRIPTION_MODEL = "gpt-4o-transcribe"look good. Right now, though, the batch adapter still has its ownDEFAULT_MODEL = "whisper-1", so defaults differ between batch and realtime. If that’s not intentional, it may be worth reusing this constant (or adding a short comment explaining why batch keeps a different default) to avoid future drift.owhisper/owhisper-client/src/adapter/openai/live.rs (1)
10-15: Good centralization of VAD configurationExtracting the VAD literals into constants and wiring them into
TurnDetectionkeeps the behavior the same while making future tuning or feature‑flagging much easier. If you later want per‑deployment or per‑request control, these constants are a natural place to hook in configuration orListenParamsfields, but that can be deferred.Also applies to: 87-90
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
14-21: Conditional verbose vs JSON response handling for word timestamps looks correctThe separation of response formats using
supports_word_timestampsand the new constants (RESPONSE_FORMAT_VERBOSE,RESPONSE_FORMAT_JSON,TIMESTAMP_GRANULARITY) is clear:whisper-1keeps verbose JSON + word timestamps, while other models fall back to plain JSON. That matches the intent of only requestingtimestamp_granularities[]=wordwhere it’s actually supported and avoids breaking newer models.One minor design tweak to consider: if you decide that
"gpt-4o-transcribe"should also be the default for batch, it might be worth reusing the shared default fromopenai::modinstead of keeping a separateDEFAULT_MODELhere, to reduce config drift.Also applies to: 85-99
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/desktop/src/components/settings/ai/stt/shared.tsx(2 hunks)owhisper/owhisper-client/src/adapter/openai/batch.rs(2 hunks)owhisper/owhisper-client/src/adapter/openai/live.rs(2 hunks)owhisper/owhisper-client/src/adapter/openai/mod.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, usecn(import from@hypr/utils). It is similar toclsx. Always pass an array and split by logical grouping.
Usemotion/reactinstead offramer-motion.
Files:
apps/desktop/src/components/settings/ai/stt/shared.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: fmt
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: Devin
🔇 Additional comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
119-123: Double‑check JSON schema compatibility for non‑verbose modelsFor non‑
whisper-1models you now sendresponse_format=jsonbut still deserialize intoOpenAIVerboseResponseand then feed that intoconvert_response. Because all extra fields are optional andwordshas a#[serde(default)], this should safely yield an emptywordslist while still usingtext/language.It would be good to explicitly verify against the current OpenAI docs / a live response that
response_format=jsonforgpt-4o-transcribeandgpt-4o-mini-transcribeindeed returns at least thetext(and optionallylanguage) fields at the top level so deserialization can’t fail unexpectedly.Also applies to: 149-172
Summary
This PR refactors the OpenAI STT adapter and adds support for the newer gpt-4o transcription models. Key changes:
Adapter refactoring:
OpenAISegmentstruct andsegmentsfield that were never used)Model-aware request building:
whisper-1: Usesverbose_jsonwith word-level timestamps (existing behavior)gpt-4o-transcribe/gpt-4o-mini-transcribe: Usesjsonformat (no word timestamps, per OpenAI API limitations)UI updates:
Review & Testing Checklist for Human
wordsarray - verify downstream features (word highlighting, scrubber, etc.) handle this gracefullyRecommended test plan:
Notes
The gpt-4o models provide higher quality transcription but don't support word-level timestamps in the batch API. This is an OpenAI limitation, not a bug. Users who need word timestamps should use whisper-1.
Link to Devin run: https://app.devin.ai/sessions/02cc0fe084c3495db89f417c806c67f5
Requested by: yujonglee (@yujonglee)