A powerful Raycast extension that converts speech to text and formats the output using OpenAI's advanced language models. Perfect for quick dictation, note-taking, and professional communication with intelligent text formatting.
Note: This project is a fork of Advanced Speech to Text by Iñaki Tajes. Special thanks to Iñaki for creating the original extension that served as the foundation for this enhanced version.
📖 Русская версия / Russian version
- Instant Recording: Auto-starts recording when opened for immediate use
- High-Quality Transcription: Uses OpenAI's Whisper-1 and GPT-4o Transcribe models
- Smart Text Formatting: Format transcriptions using OpenAI's language models (GPT-4o, GPT-4o-mini)
- Multiple Output Formats: Original, Email, Slack, Report, Task List, and Translation modes
- Flexible Paste Behavior: Four modes - paste only, copy & paste, copy only, or show with action buttons
- Recording History: View and manage past recordings with transcriptions
- Transcription History: View history of transcribed text messages with configurable storage limit
- Custom Prompts: Use custom prompt files for transcription context and formatting
- Keyboard Shortcuts: Comprehensive hotkeys for efficient workflow
- Multi-Language Support: Supports 11 languages with auto-detection
- OpenAI-Compatible APIs: Configurable base URL for alternative providers
- Install the extension from the Raycast Store
- Install SoX (Sound eXchange) for audio recording:
# Using Homebrew brew install sox # Using MacPorts sudo port install sox
- Configure your OpenAI API key in Raycast preferences
OpenAI API Key: For both speech transcription and text formatting
- Get your key from OpenAI Platform
- Single API key handles all operations (transcription and formatting)
- SoX: Required for audio recording
- macOS: Raycast extension platform
- Microphone: For audio input
- Open Raycast and type "Dictate"
- Recording starts automatically
- Speak your message
- Press Enter to stop and transcribe
- Text behavior depends on paste preference setting
While recording, you can choose different output formats using keyboard shortcuts:
- Original (Enter): Raw transcription with configurable paste behavior
- Email (Cmd+E): Professional email format with greetings and structure
- Slack (Cmd+S): Clean, casual message formatting
- Report (Cmd+R): Structured task/project reporting format
- Task (Cmd+T): Structured task list with checkboxes and action items
- Translation (Cmd+L): Translate to English or improve English text
Configure how text is handled after transcription:
- Paste only: Direct paste to active field (no clipboard copy)
- Copy and paste: Copy to clipboard then paste to active field
- Copy only: Copy to clipboard without automatic pasting
- Show with action buttons: Display result with manual action selection
During Recording:
Enter- Stop and transcribe with original formatCmd+E- Stop and format as EmailCmd+S- Stop and format for SlackCmd+T- Stop and format as Task ListCmd+R- Stop and format as ReportCmd+L- Stop and translate to EnglishCmd+.- Cancel recording
In Results View:
Enter- Paste to selected field (primary action)Cmd+C- Copy to clipboardCmd+V- Copy and pasteCmd+Shift+V- Paste onlyCmd+Shift+C- Copy onlyCmd+E- Format as EmailCmd+S- Format for SlackCmd+T- Format as Task ListCmd+R- Format as ReportCmd+L- Translate to EnglishCmd+O- Use original transcriptionCmd+N- New recording
- Access via "Recording History" command
- View past recordings with metadata (duration, size, word count)
- Re-transcribe audio files with current settings
- Copy transcriptions to clipboard
- Automatic file cleanup after 24 hours
- Access via "Transcription History" command
- View history of the last transcribed messages (original text only)
- Configurable storage limit (0=disabled, 5-100 messages, unlimited)
- Copy transcriptions to clipboard
- Delete individual entries or clear all history
- Persistent storage using Raycast LocalStorage
- Model: Text field for any OpenAI transcription model (default: whisper-1)
- Examples: whisper-1, gpt-4o-transcribe
- Language: Auto-detect or specify language (English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Russian)
- Temperature: Control transcription creativity (0, 0.2, 0.5, 0.8, 1.0)
- Context File: Custom prompt file to improve accuracy for specific terms
- Formatting Model: Text field for any OpenAI language model (default: gpt-4o)
- Examples: gpt-4o, gpt-4o-mini, gpt-3.5-turbo
- Base URL: OpenAI-compatible API endpoint (default: https://api.openai.com/v1)
- Custom Prompts: Override default formatting prompts with custom files
- Paste Behavior: Choose how text is handled after processing
- Paste only, Copy and paste, Copy only, Show with action buttons
- Transcription History Limit: Configure how many transcriptions to store
- Options: Disabled (0), 5-100 messages, Unlimited (-1)
- Default: 10 messages
- Automatically saves original transcriptions to local storage
You can provide custom prompt files for better transcription accuracy and formatting:
Improves speech recognition for specific names, terms, or abbreviations:
## Prompt
Context for speech recognition:
- Company: Acme Corporation
- Names: John Smith, Sarah Johnson
- Technical terms: Kubernetes, PostgreSQL
- Abbreviations: API, CI/CD, SLA
Override default formatting with custom prompts:
## Prompt
Transform the following text into a casual team update email.
Keep it friendly and conversational, focusing on progress and next steps.
- Dictate Command (
src/dictate.tsx): Main recording and transcription workflow with state management - Recording History (
src/recording-history.tsx): History management interface for audio files - Transcription History (
src/transcription-history.tsx): History management interface for transcribed text - Audio Recorder Hook (
src/hooks/useAudioRecorder.ts): SoX integration for recording
- Uses SoX (Sound eXchange) for cross-platform recording
- Records in WAV format (16kHz, mono, 16-bit) optimized for Whisper
- Automatic SoX detection across common installation paths:
/usr/local/bin/sox,/opt/homebrew/bin/sox,/usr/bin/sox - Audio validation (1KB minimum, 25MB maximum) and temporary file management
- OpenAI: Single provider for both transcription and text formatting
- Configurable base URL supports OpenAI-compatible APIs
- Robust error handling for API rate limits, authentication, and quota issues
- Temperature setting affects both transcription and formatting
- Temporary audio files stored in
~/Library/Application Support/com.raycast.macos/extensions/dictation-ai/temp - Automatic cleanup of files older than 24 hours
- Transcription history stored persistently using Raycast LocalStorage
- File validation and size limit enforcement
Advanced clipboard management with multiple paste behaviors:
- Direct paste with fallback to copy
- Smart routing based on user preferences
- HUD notifications for user feedback
- Error handling with graceful degradation
npm run dev # Start Raycast development mode
npm run build # Build the extension
npm run lint # Run ESLint
npm run fix-lint # Fix ESLint issues automatically
npm run test # Run Jest tests
npm run test:watch # Run Jest tests in watch mode
npm run test:coverage # Run Jest tests with coverage report
npm run publish # Publish to Raycast Storesrc/
├── dictate.tsx # Main dictation command
├── recording-history.tsx # Audio file history management
├── transcription-history.tsx # Text transcription history management
├── hooks/
│ └── useAudioRecorder.ts # Audio recording logic
├── utils/
│ ├── audio.ts # SoX integration & file management
│ ├── openai.ts # OpenAI API integration
│ ├── formatters.ts # Text formatting with OpenAI models
│ ├── clipboard.ts # Clipboard operations with paste behaviors
│ ├── errors.ts # Centralized error handling
│ ├── prompts.ts # Custom prompt loading utilities
│ ├── history.ts # Transcription history storage
│ └── time.ts # Time/size formatting utilities
├── types.ts # TypeScript definitions
└── constants.ts # Configuration constants
SoX not installed
# Install via Homebrew
brew install soxRecording not working
- Check microphone permissions in System Preferences
- Ensure SoX is installed and accessible in PATH
- Try different microphone input sources
- Verify audio input levels
API Errors
- Verify OpenAI API key has sufficient credits
- Check API key permissions for audio and chat endpoints
- Ensure internet connection is stable
- Try different base URL if using alternative provider
File Size Limits
- Maximum audio file size: 25MB (OpenAI limit)
- Minimum audio size: 1KB for validation
- Files are automatically validated before transcription
- Long recordings may hit size limits
Paste/Clipboard Issues
- Check app permissions for clipboard access
- Try different paste behavior settings
- Verify target application accepts clipboard input
- Use "Show with action buttons" mode for manual control
Sox is not installed: Install SoX using Homebrew or MacPortsAudio file exceeds 25MB limit: Recording too long, try shorter sessionsOpenAI API key is required: Configure API key in preferencesTranscription failed: Check API key, credits, and internet connectionAudio file is too short: Ensure minimum 1KB file sizePaste Error: Target application may not accept clipboard input
- Audio files are temporarily stored locally and automatically deleted after 24 hours
- No persistent storage of voice recordings
- Transcription history stored locally using Raycast's secure LocalStorage
- API keys are stored securely in Raycast preferences
- All API calls use HTTPS encryption
- No telemetry or usage tracking
- Local processing with external API calls only for transcription/formatting
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Update documentation as needed
- Submit a pull request
- Transcription History: New command to view and manage history of transcribed text messages
- Configurable History Limit: Set storage limit from 0 (disabled) to unlimited
- Unified OpenAI Integration: Removed OpenRouter dependency, now uses only OpenAI for all operations
- Flexible Model Configuration: Text fields for custom model selection
- Enhanced Paste Behavior: Four distinct modes including "show with actions"
- Comprehensive Keyboard Shortcuts: Full hotkey support for efficient workflow
- OpenAI-Compatible APIs: Configurable base URL for alternative providers
- Improved Error Handling: Better error messages and recovery options