-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
area:backendBackend/server codeBackend/server codearea:frontendFrontend/UI codeFrontend/UI codearea:securitySecurity-sensitive behaviorSecurity-sensitive behaviorarea:uxUser experience and interaction designUser experience and interaction designenhancementNew feature or requestNew feature or request
Description
Problem
Typing long prompts/commands on mobile is slow. The app currently has no voice dictation path.
Proposal
Add an audio dictation feature that lets users record speech in the mobile UI, upload audio to the server, and make the resulting transcript immediately usable.
Core Requirement
- Capture audio from browser and upload recording to backend (persist to a known location under app CWD or configured temp/work dir)
- Expose transcript result in UI so text is one-tap copyable and/or insertable into compose row or active terminal input flow
Bonus
- Server-side automatic transcription (e.g. Whisper or compatible STT backend)
- Return transcript progressively or as soon as available
Scope
- Frontend: record/stop/cancel UI and upload action
- Backend: authenticated upload endpoint or WS message handling
- Storage policy: filename format, retention/cleanup, max size/duration guardrails
- Transcription adapter abstraction so local Whisper/remote STT can be swapped
- UI affordances: copy transcript, insert into compose, optional direct inject into terminal
Security / Privacy
- Respect existing auth boundaries
- Limit upload size and accepted MIME types
- Document whether recordings are transient or persisted, and cleanup behavior
Acceptance Criteria
- User can record audio on mobile and successfully upload it
- Uploaded file lands in configured server-side path with predictable naming
- Transcript appears in UI and can be copied/injected into compose row
- Optional terminal injection path is explicit (not accidental)
- Tests cover upload validation + transcript-to-UI path (at least one happy path + one validation failure)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:backendBackend/server codeBackend/server codearea:frontendFrontend/UI codeFrontend/UI codearea:securitySecurity-sensitive behaviorSecurity-sensitive behaviorarea:uxUser experience and interaction designUser experience and interaction designenhancementNew feature or requestNew feature or request