Refactor OpenAI adapter and add model support #2128

yujonglee · 2025-12-05T02:01:37Z

Summary

This PR refactors the OpenAI STT adapter and adds support for the newer gpt-4o transcription models. Key changes:

Adapter refactoring:

Removed dead code (OpenAISegment struct and segments field that were never used)
Extracted magic values to named constants (VAD config, response formats)

Model-aware request building:

whisper-1: Uses verbose_json with word-level timestamps (existing behavior)
gpt-4o-transcribe / gpt-4o-mini-transcribe: Uses json format (no word timestamps, per OpenAI API limitations)

UI updates:

Added gpt-4o-transcribe and gpt-4o-mini-transcribe to the OpenAI provider model list
Added display names for the new models

Review & Testing Checklist for Human

Verify model selection flow: Select gpt-4o-transcribe in Settings → STT and confirm the adapter receives the correct model parameter during transcription
Test batch transcription with gpt-4o models: The response will have an empty words array - verify downstream features (word highlighting, scrubber, etc.) handle this gracefully
Confirm whisper-1 still works: Ensure the existing whisper-1 path with word timestamps is not regressed
Check realtime transcription: The realtime API already uses gpt-4o-transcribe by default - verify live transcription still works

Recommended test plan:

Start a recording session with OpenAI provider + gpt-4o-transcribe model
Verify live transcription works
Stop recording and verify batch transcription completes (will have no word timestamps)
Repeat with whisper-1 model and verify word timestamps are present

Notes

The gpt-4o models provide higher quality transcription but don't support word-level timestamps in the batch API. This is an OpenAI limitation, not a bug. Users who need word timestamps should use whisper-1.

Link to Devin run: https://app.devin.ai/sessions/02cc0fe084c3495db89f417c806c67f5
Requested by: yujonglee (@yujonglee)

- Remove dead code (OpenAISegment struct, segments field) - Extract magic values to constants (VAD config, response formats) - Add model-aware request building for batch API - whisper-1: uses verbose_json with word timestamps - gpt-4o-transcribe/gpt-4o-mini-transcribe: uses json format - Update UI to show all OpenAI STT models - Add model documentation comments Co-Authored-By: yujonglee <[email protected]>

devin-ai-integration · 2025-12-05T02:01:40Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

netlify · 2025-12-05T02:01:49Z

✅ Deploy Preview for hyprnote-storybook ready!

Name	Link
🔨 Latest commit	`1c98cf9`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote-storybook/deploys/69323d058ddc340008fac378
😎 Deploy Preview	https://deploy-preview-2128--hyprnote-storybook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-12-05T02:02:01Z

📝 Walkthrough

Walkthrough

Adds support for two new OpenAI transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe) to the UI settings. Updates the OpenAI adapter to conditionally use word-level timestamps (whisper-1 only), centralizes Voice Activity Detection defaults, and sets a default transcription model constant.

Changes

Cohort / File(s)	Change Summary
UI Settings Model Registration `apps/desktop/src/components/settings/ai/stt/shared.tsx`	Adds displayModelId handling for new OpenAI transcription model identifiers (gpt-4o-transcribe → "GPT-4o Transcribe", gpt-4o-mini-transcribe → "GPT-4o mini Transcribe") and includes both models in the OpenAI provider's models array alongside whisper-1.
OpenAI Batch Adapter `owhisper/owhisper-client/src/adapter/openai/batch.rs`	Introduces word-timestamp support detection with new constants (RESPONSE_FORMAT_VERBOSE, RESPONSE_FORMAT_JSON, TIMESTAMP_GRANULARITY) and adds supports_word_timestamps() helper. Replaces unconditional verbose_json + word-timestamp requests with conditional logic: uses verbose_json and word timestamp granularity for whisper-1, falls back to standard json for other models. Removes unused OpenAISegment struct and segments field from OpenAIVerboseResponse.
OpenAI Live Adapter VAD Configuration `owhisper/owhisper-client/src/adapter/openai/live.rs`	Extracts Voice Activity Detection settings into named constants (VAD_DETECTION_TYPE, VAD_THRESHOLD, VAD_PREFIX_PADDING_MS, VAD_SILENCE_DURATION_MS) and replaces hardcoded values in TurnDetection session update payload with constant references.
OpenAI Module Configuration `owhisper/owhisper-client/src/adapter/openai/mod.rs`	Adds DEFAULT_TRANSCRIPTION_MODEL constant set to "gpt-4o-transcribe" with descriptive comments documenting available OpenAI STT models.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Word-timestamp support detection logic in batch.rs: verify the supports_word_timestamps() helper correctly identifies model capabilities and that conditional request building is correct.
VAD constant values in live.rs: confirm the extracted defaults match the original hardcoded values and are appropriate for production use.
Model registration consistency: ensure both new models are registered in settings UI and properly handled across adapter layers.

Possibly related PRs

fastrepl/hyprnote#2126: Modifies the same OpenAI adapter files (live.rs, mod.rs) for transcription model configuration and session settings.
fastrepl/hyprnote#2060: Updates the same settings UI file (apps/desktop/src/components/settings/ai/stt/shared.tsx) to register new STT models in displayModelId and PROVIDERS.
fastrepl/hyprnote#2047: Modifies the same settings UI file (apps/desktop/.../stt/shared.tsx) for model entry registration and displayModelId handling.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: refactoring the OpenAI adapter and adding support for new gpt-4o transcription models.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed context about adapter refactoring, model-aware request building, UI updates, and testing guidance.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1764899169-refactor-openai-adapter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

netlify · 2025-12-05T02:02:30Z

✅ Deploy Preview for hyprnote ready!

Name	Link
🔨 Latest commit	`1c98cf9`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote/deploys/69323d0588e4f50008267a74
😎 Deploy Preview	https://deploy-preview-2128--hyprnote.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

apps/desktop/src/components/settings/ai/stt/shared.tsx (1)

47-53: New OpenAI model mappings and provider configuration look consistent

displayModelId now cleanly handles the two new model IDs, and the OpenAI provider’s models list matches those IDs, so UI selection and labeling should work end‑to‑end with the new backend defaults. Only tiny nit: if you care about consistency with other labels, you might prefer "GPT-4o Mini Transcribe" (capital “Mini”), but that’s purely cosmetic.

Also applies to: 164-164

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

7-12: Consider unifying batch and realtime default model configuration

The documented model list and DEFAULT_TRANSCRIPTION_MODEL = "gpt-4o-transcribe" look good. Right now, though, the batch adapter still has its own DEFAULT_MODEL = "whisper-1", so defaults differ between batch and realtime. If that’s not intentional, it may be worth reusing this constant (or adding a short comment explaining why batch keeps a different default) to avoid future drift.

owhisper/owhisper-client/src/adapter/openai/live.rs (1)

10-15: Good centralization of VAD configuration

Extracting the VAD literals into constants and wiring them into TurnDetection keeps the behavior the same while making future tuning or feature‑flagging much easier. If you later want per‑deployment or per‑request control, these constants are a natural place to hook in configuration or ListenParams fields, but that can be deferred.

Also applies to: 87-90

owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

14-21: Conditional verbose vs JSON response handling for word timestamps looks correct

The separation of response formats using supports_word_timestamps and the new constants (RESPONSE_FORMAT_VERBOSE, RESPONSE_FORMAT_JSON, TIMESTAMP_GRANULARITY) is clear: whisper-1 keeps verbose JSON + word timestamps, while other models fall back to plain JSON. That matches the intent of only requesting timestamp_granularities[]=word where it’s actually supported and avoids breaking newer models.

One minor design tweak to consider: if you decide that "gpt-4o-transcribe" should also be the default for batch, it might be worth reusing the shared default from openai::mod instead of keeping a separate DEFAULT_MODEL here, to reduce config drift.

Also applies to: 85-99

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a56b7a and 1c98cf9.

📒 Files selected for processing (4)

apps/desktop/src/components/settings/ai/stt/shared.tsx (2 hunks)
owhisper/owhisper-client/src/adapter/openai/batch.rs (2 hunks)
owhisper/owhisper-client/src/adapter/openai/live.rs (2 hunks)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, use cn (import from @hypr/utils). It is similar to clsx. Always pass an array and split by logical grouping.
Use motion/react instead of framer-motion.

Files:

apps/desktop/src/components/settings/ai/stt/shared.tsx

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
GitHub Check: fmt
GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
GitHub Check: Devin

🔇 Additional comments (1)

owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

119-123: Double‑check JSON schema compatibility for non‑verbose models

For non‑whisper-1 models you now send response_format=json but still deserialize into OpenAIVerboseResponse and then feed that into convert_response. Because all extra fields are optional and words has a #[serde(default)], this should safely yield an empty words list while still using text/language.

It would be good to explicitly verify against the current OpenAI docs / a live response that response_format=json for gpt-4o-transcribe and gpt-4o-mini-transcribe indeed returns at least the text (and optionally language) fields at the top level so deserialization can’t fail unexpectedly.

Also applies to: 149-172

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

yujonglee merged commit ae2a949 into main Dec 5, 2025
15 of 16 checks passed

yujonglee deleted the devin/1764899169-refactor-openai-adapter branch December 5, 2025 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor OpenAI adapter and add model support #2128

Refactor OpenAI adapter and add model support #2128

Uh oh!

yujonglee commented Dec 5, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Dec 5, 2025

Uh oh!

netlify bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

netlify bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor OpenAI adapter and add model support #2128

Refactor OpenAI adapter and add model support #2128

Uh oh!

Conversation

yujonglee commented Dec 5, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Dec 5, 2025

🤖 Devin AI Engineer

Uh oh!

netlify bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote-storybook ready!

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

netlify bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote ready!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yujonglee commented Dec 5, 2025 •

edited by devin-ai-integration bot

Loading

netlify bot commented Dec 5, 2025 •

edited

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

netlify bot commented Dec 5, 2025 •

edited

Loading