Skip to content

Add audio dictation flow (record/upload + transcription + compose injection) #49

@guysmoilov

Description

@guysmoilov

Problem

Typing long prompts/commands on mobile is slow. The app currently has no voice dictation path.

Proposal

Add an audio dictation feature that lets users record speech in the mobile UI, upload audio to the server, and make the resulting transcript immediately usable.

Core Requirement

  • Capture audio from browser and upload recording to backend (persist to a known location under app CWD or configured temp/work dir)
  • Expose transcript result in UI so text is one-tap copyable and/or insertable into compose row or active terminal input flow

Bonus

  • Server-side automatic transcription (e.g. Whisper or compatible STT backend)
  • Return transcript progressively or as soon as available

Scope

  • Frontend: record/stop/cancel UI and upload action
  • Backend: authenticated upload endpoint or WS message handling
  • Storage policy: filename format, retention/cleanup, max size/duration guardrails
  • Transcription adapter abstraction so local Whisper/remote STT can be swapped
  • UI affordances: copy transcript, insert into compose, optional direct inject into terminal

Security / Privacy

  • Respect existing auth boundaries
  • Limit upload size and accepted MIME types
  • Document whether recordings are transient or persisted, and cleanup behavior

Acceptance Criteria

  • User can record audio on mobile and successfully upload it
  • Uploaded file lands in configured server-side path with predictable naming
  • Transcript appears in UI and can be copied/injected into compose row
  • Optional terminal injection path is explicit (not accidental)
  • Tests cover upload validation + transcript-to-UI path (at least one happy path + one validation failure)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:backendBackend/server codearea:frontendFrontend/UI codearea:securitySecurity-sensitive behaviorarea:uxUser experience and interaction designenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions