feat: use Web Speech API for voice transcription #149

blinkagent · 2026-01-08T18:13:57Z

Summary

Replace the broken server-side OpenAI Whisper transcription with the browser-native Web Speech API.

Problem

The microphone button in the chat input was calling /api/speech-to-text which no longer exists. The original API endpoint was removed in commit c4200ec9 ("Remove all legacy Blink v1 code") but the microphone button UI was kept, causing all transcription attempts to fail with "Speech transcription failed. Please try again."

Solution

Instead of re-adding the server-side OpenAI Whisper integration, this PR uses the browser-native Web Speech API (SpeechRecognition) which:

Requires no API keys or backend calls
Streams results in real-time
Works offline after initial setup
Uses the browser's locale for recognition language

Changes

Remove MediaRecorder-based audio capture and /api/speech-to-text fetch call
Implement SpeechRecognition API for real-time transcription
Auto-detect browser support and hide button on unsupported browsers
Use continuous mode with interim results for better UX
Handle common errors (permission denied, no mic, network issues)
Remove the isTranscribing state since transcription happens in real-time

Browser Support

Browser	Support
Chrome	✅ Full
Edge	✅ Full
Safari	⚠️ Partial (requires user gesture)
Firefox	❌ Behind flag

On unsupported browsers, the microphone button is hidden rather than showing a broken feature.

Testing

Open the chat interface in Chrome/Edge
Click the microphone button
Allow microphone access if prompted
Speak into the microphone
Click the button again to stop
The transcribed text should appear in the input field

Replace the broken server-side OpenAI Whisper transcription with the browser-native Web Speech API. Changes: - Remove MediaRecorder-based audio capture and /api/speech-to-text call - Implement SpeechRecognition API for real-time transcription - Auto-detect browser support and hide button on unsupported browsers - Use continuous mode with interim results for better UX - Handle common errors (permission denied, no mic, network issues) - Automatically use browser locale for recognition language The original speech-to-text API endpoint was removed in the "Remove all legacy Blink v1 code" cleanup (c4200ec9) but the microphone button UI was kept, causing all transcription attempts to fail with 404. The Web Speech API is supported in Chrome, Edge, Safari (partial), and Firefox (behind flag). On unsupported browsers, the microphone button is now hidden rather than showing a broken feature.

vercel · 2026-01-08T18:14:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
blink	Ready	Preview, Comment	Jan 8, 2026 6:15pm

vercel bot deployed to Preview January 8, 2026 18:15 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use Web Speech API for voice transcription #149

feat: use Web Speech API for voice transcription #149

Uh oh!

blinkagent bot commented Jan 8, 2026

Uh oh!

vercel bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

feat: use Web Speech API for voice transcription #149

Are you sure you want to change the base?

feat: use Web Speech API for voice transcription #149

Uh oh!

Conversation

blinkagent bot commented Jan 8, 2026

Summary

Problem

Solution

Changes

Browser Support

Testing

Related

Uh oh!

vercel bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

vercel bot commented Jan 8, 2026 •

edited

Loading