Skip to content

Conversation

@blinkagent
Copy link
Contributor

@blinkagent blinkagent bot commented Jan 8, 2026

Summary

Replace the broken server-side OpenAI Whisper transcription with the browser-native Web Speech API.

Problem

The microphone button in the chat input was calling /api/speech-to-text which no longer exists. The original API endpoint was removed in commit c4200ec9 ("Remove all legacy Blink v1 code") but the microphone button UI was kept, causing all transcription attempts to fail with "Speech transcription failed. Please try again."

Solution

Instead of re-adding the server-side OpenAI Whisper integration, this PR uses the browser-native Web Speech API (SpeechRecognition) which:

  • Requires no API keys or backend calls
  • Streams results in real-time
  • Works offline after initial setup
  • Uses the browser's locale for recognition language

Changes

  • Remove MediaRecorder-based audio capture and /api/speech-to-text fetch call
  • Implement SpeechRecognition API for real-time transcription
  • Auto-detect browser support and hide button on unsupported browsers
  • Use continuous mode with interim results for better UX
  • Handle common errors (permission denied, no mic, network issues)
  • Remove the isTranscribing state since transcription happens in real-time

Browser Support

Browser Support
Chrome ✅ Full
Edge ✅ Full
Safari ⚠️ Partial (requires user gesture)
Firefox ❌ Behind flag

On unsupported browsers, the microphone button is hidden rather than showing a broken feature.

Testing

  1. Open the chat interface in Chrome/Edge
  2. Click the microphone button
  3. Allow microphone access if prompted
  4. Speak into the microphone
  5. Click the button again to stop
  6. The transcribed text should appear in the input field

Related

Slack thread: Investigation into why voice transcription was failing

Replace the broken server-side OpenAI Whisper transcription with the
browser-native Web Speech API.

Changes:
- Remove MediaRecorder-based audio capture and /api/speech-to-text call
- Implement SpeechRecognition API for real-time transcription
- Auto-detect browser support and hide button on unsupported browsers
- Use continuous mode with interim results for better UX
- Handle common errors (permission denied, no mic, network issues)
- Automatically use browser locale for recognition language

The original speech-to-text API endpoint was removed in the "Remove all
legacy Blink v1 code" cleanup (c4200ec9) but the microphone button UI
was kept, causing all transcription attempts to fail with 404.

The Web Speech API is supported in Chrome, Edge, Safari (partial), and
Firefox (behind flag). On unsupported browsers, the microphone button
is now hidden rather than showing a broken feature.
@vercel
Copy link

vercel bot commented Jan 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
blink Ready Ready Preview, Comment Jan 8, 2026 6:15pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants