Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 5, 2025

Summary

This PR refactors the OpenAI STT adapter and adds support for the newer gpt-4o transcription models. Key changes:

Adapter refactoring:

  • Removed dead code (OpenAISegment struct and segments field that were never used)
  • Extracted magic values to named constants (VAD config, response formats)

Model-aware request building:

  • whisper-1: Uses verbose_json with word-level timestamps (existing behavior)
  • gpt-4o-transcribe / gpt-4o-mini-transcribe: Uses json format (no word timestamps, per OpenAI API limitations)

UI updates:

  • Added gpt-4o-transcribe and gpt-4o-mini-transcribe to the OpenAI provider model list
  • Added display names for the new models

Review & Testing Checklist for Human

  • Verify model selection flow: Select gpt-4o-transcribe in Settings → STT and confirm the adapter receives the correct model parameter during transcription
  • Test batch transcription with gpt-4o models: The response will have an empty words array - verify downstream features (word highlighting, scrubber, etc.) handle this gracefully
  • Confirm whisper-1 still works: Ensure the existing whisper-1 path with word timestamps is not regressed
  • Check realtime transcription: The realtime API already uses gpt-4o-transcribe by default - verify live transcription still works

Recommended test plan:

  1. Start a recording session with OpenAI provider + gpt-4o-transcribe model
  2. Verify live transcription works
  3. Stop recording and verify batch transcription completes (will have no word timestamps)
  4. Repeat with whisper-1 model and verify word timestamps are present

Notes

The gpt-4o models provide higher quality transcription but don't support word-level timestamps in the batch API. This is an OpenAI limitation, not a bug. Users who need word timestamps should use whisper-1.

Link to Devin run: https://app.devin.ai/sessions/02cc0fe084c3495db89f417c806c67f5
Requested by: yujonglee (@yujonglee)

- Remove dead code (OpenAISegment struct, segments field)
- Extract magic values to constants (VAD config, response formats)
- Add model-aware request building for batch API
  - whisper-1: uses verbose_json with word timestamps
  - gpt-4o-transcribe/gpt-4o-mini-transcribe: uses json format
- Update UI to show all OpenAI STT models
- Add model documentation comments

Co-Authored-By: yujonglee <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 1c98cf9
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/69323d058ddc340008fac378
😎 Deploy Preview https://deploy-preview-2128--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

Adds support for two new OpenAI transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe) to the UI settings. Updates the OpenAI adapter to conditionally use word-level timestamps (whisper-1 only), centralizes Voice Activity Detection defaults, and sets a default transcription model constant.

Changes

Cohort / File(s) Change Summary
UI Settings Model Registration
apps/desktop/src/components/settings/ai/stt/shared.tsx
Adds displayModelId handling for new OpenAI transcription model identifiers (gpt-4o-transcribe → "GPT-4o Transcribe", gpt-4o-mini-transcribe → "GPT-4o mini Transcribe") and includes both models in the OpenAI provider's models array alongside whisper-1.
OpenAI Batch Adapter
owhisper/owhisper-client/src/adapter/openai/batch.rs
Introduces word-timestamp support detection with new constants (RESPONSE_FORMAT_VERBOSE, RESPONSE_FORMAT_JSON, TIMESTAMP_GRANULARITY) and adds supports_word_timestamps() helper. Replaces unconditional verbose_json + word-timestamp requests with conditional logic: uses verbose_json and word timestamp granularity for whisper-1, falls back to standard json for other models. Removes unused OpenAISegment struct and segments field from OpenAIVerboseResponse.
OpenAI Live Adapter VAD Configuration
owhisper/owhisper-client/src/adapter/openai/live.rs
Extracts Voice Activity Detection settings into named constants (VAD_DETECTION_TYPE, VAD_THRESHOLD, VAD_PREFIX_PADDING_MS, VAD_SILENCE_DURATION_MS) and replaces hardcoded values in TurnDetection session update payload with constant references.
OpenAI Module Configuration
owhisper/owhisper-client/src/adapter/openai/mod.rs
Adds DEFAULT_TRANSCRIPTION_MODEL constant set to "gpt-4o-transcribe" with descriptive comments documenting available OpenAI STT models.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Word-timestamp support detection logic in batch.rs: verify the supports_word_timestamps() helper correctly identifies model capabilities and that conditional request building is correct.
  • VAD constant values in live.rs: confirm the extracted defaults match the original hardcoded values and are appropriate for production use.
  • Model registration consistency: ensure both new models are registered in settings UI and properly handled across adapter layers.

Possibly related PRs

  • fastrepl/hyprnote#2126: Modifies the same OpenAI adapter files (live.rs, mod.rs) for transcription model configuration and session settings.
  • fastrepl/hyprnote#2060: Updates the same settings UI file (apps/desktop/src/components/settings/ai/stt/shared.tsx) to register new STT models in displayModelId and PROVIDERS.
  • fastrepl/hyprnote#2047: Modifies the same settings UI file (apps/desktop/.../stt/shared.tsx) for model entry registration and displayModelId handling.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: refactoring the OpenAI adapter and adding support for new gpt-4o transcription models.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about adapter refactoring, model-aware request building, UI updates, and testing guidance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764899169-refactor-openai-adapter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 1c98cf9
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/69323d0588e4f50008267a74
😎 Deploy Preview https://deploy-preview-2128--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
apps/desktop/src/components/settings/ai/stt/shared.tsx (1)

47-53: New OpenAI model mappings and provider configuration look consistent

displayModelId now cleanly handles the two new model IDs, and the OpenAI provider’s models list matches those IDs, so UI selection and labeling should work end‑to‑end with the new backend defaults. Only tiny nit: if you care about consistency with other labels, you might prefer "GPT-4o Mini Transcribe" (capital “Mini”), but that’s purely cosmetic.

Also applies to: 164-164

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

7-12: Consider unifying batch and realtime default model configuration

The documented model list and DEFAULT_TRANSCRIPTION_MODEL = "gpt-4o-transcribe" look good. Right now, though, the batch adapter still has its own DEFAULT_MODEL = "whisper-1", so defaults differ between batch and realtime. If that’s not intentional, it may be worth reusing this constant (or adding a short comment explaining why batch keeps a different default) to avoid future drift.

owhisper/owhisper-client/src/adapter/openai/live.rs (1)

10-15: Good centralization of VAD configuration

Extracting the VAD literals into constants and wiring them into TurnDetection keeps the behavior the same while making future tuning or feature‑flagging much easier. If you later want per‑deployment or per‑request control, these constants are a natural place to hook in configuration or ListenParams fields, but that can be deferred.

Also applies to: 87-90

owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

14-21: Conditional verbose vs JSON response handling for word timestamps looks correct

The separation of response formats using supports_word_timestamps and the new constants (RESPONSE_FORMAT_VERBOSE, RESPONSE_FORMAT_JSON, TIMESTAMP_GRANULARITY) is clear: whisper-1 keeps verbose JSON + word timestamps, while other models fall back to plain JSON. That matches the intent of only requesting timestamp_granularities[]=word where it’s actually supported and avoids breaking newer models.

One minor design tweak to consider: if you decide that "gpt-4o-transcribe" should also be the default for batch, it might be worth reusing the shared default from openai::mod instead of keeping a separate DEFAULT_MODEL here, to reduce config drift.

Also applies to: 85-99

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a56b7a and 1c98cf9.

📒 Files selected for processing (4)
  • apps/desktop/src/components/settings/ai/stt/shared.tsx (2 hunks)
  • owhisper/owhisper-client/src/adapter/openai/batch.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/openai/live.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, use cn (import from @hypr/utils). It is similar to clsx. Always pass an array and split by logical grouping.
Use motion/react instead of framer-motion.

Files:

  • apps/desktop/src/components/settings/ai/stt/shared.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
  • GitHub Check: fmt
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: Devin
🔇 Additional comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

119-123: Double‑check JSON schema compatibility for non‑verbose models

For non‑whisper-1 models you now send response_format=json but still deserialize into OpenAIVerboseResponse and then feed that into convert_response. Because all extra fields are optional and words has a #[serde(default)], this should safely yield an empty words list while still using text/language.

It would be good to explicitly verify against the current OpenAI docs / a live response that response_format=json for gpt-4o-transcribe and gpt-4o-mini-transcribe indeed returns at least the text (and optionally language) fields at the top level so deserialization can’t fail unexpectedly.

Also applies to: 149-172

@yujonglee yujonglee merged commit ae2a949 into main Dec 5, 2025
15 of 16 checks passed
@yujonglee yujonglee deleted the devin/1764899169-refactor-openai-adapter branch December 5, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants