RayCast - Dictation AI

A powerful Raycast extension that converts speech to text and formats the output using OpenAI's advanced language models. Perfect for quick dictation, note-taking, and professional communication with intelligent text formatting.

Note: This project is a fork of Advanced Speech to Text by Iñaki Tajes. Special thanks to Iñaki for creating the original extension that served as the foundation for this enhanced version.

📖 Русская версия / Russian version

Features

Instant Recording: Auto-starts recording when opened for immediate use
High-Quality Transcription: Uses OpenAI's Whisper-1 and GPT-4o Transcribe models
Smart Text Formatting: Format transcriptions using OpenAI's language models (GPT-4o, GPT-4o-mini)
Multiple Output Formats: Original, Email, Slack, Report, Task List, and Translation modes
Flexible Paste Behavior: Four modes - paste only, copy & paste, copy only, or show with action buttons
Recording History: View and manage past recordings with transcriptions
Transcription History: View history of transcribed text messages with configurable storage limit
Custom Prompts: Use custom prompt files for transcription context and formatting
Keyboard Shortcuts: Comprehensive hotkeys for efficient workflow
Multi-Language Support: Supports 11 languages with auto-detection
OpenAI-Compatible APIs: Configurable base URL for alternative providers

Installation

Install the extension from the Raycast Store

Install SoX (Sound eXchange) for audio recording:

# Using Homebrew
brew install sox

# Using MacPorts
sudo port install sox

Configure your OpenAI API key in Raycast preferences

Setup

Required API Key

OpenAI API Key: For both speech transcription and text formatting

Get your key from OpenAI Platform
Single API key handles all operations (transcription and formatting)

System Requirements

SoX: Required for audio recording
macOS: Raycast extension platform
Microphone: For audio input

Usage

Basic Dictation

Open Raycast and type "Dictate"
Recording starts automatically
Speak your message
Press Enter to stop and transcribe
Text behavior depends on paste preference setting

Format Modes

While recording, you can choose different output formats using keyboard shortcuts:

Original (Enter): Raw transcription with configurable paste behavior
Email (Cmd+E): Professional email format with greetings and structure
Slack (Cmd+S): Clean, casual message formatting
Report (Cmd+R): Structured task/project reporting format
Task (Cmd+T): Structured task list with checkboxes and action items
Translation (Cmd+L): Translate to English or improve English text

Paste Behavior Options

Configure how text is handled after transcription:

Paste only: Direct paste to active field (no clipboard copy)
Copy and paste: Copy to clipboard then paste to active field
Copy only: Copy to clipboard without automatic pasting
Show with action buttons: Display result with manual action selection

Keyboard Shortcuts

During Recording:

Enter - Stop and transcribe with original format
Cmd+E - Stop and format as Email
Cmd+S - Stop and format for Slack
Cmd+T - Stop and format as Task List
Cmd+R - Stop and format as Report
Cmd+L - Stop and translate to English
Cmd+. - Cancel recording

In Results View:

Enter - Paste to selected field (primary action)
Cmd+C - Copy to clipboard
Cmd+V - Copy and paste
Cmd+Shift+V - Paste only
Cmd+Shift+C - Copy only
Cmd+E - Format as Email
Cmd+S - Format for Slack
Cmd+T - Format as Task List
Cmd+R - Format as Report
Cmd+L - Translate to English
Cmd+O - Use original transcription
Cmd+N - New recording

Recording History

Access via "Recording History" command
View past recordings with metadata (duration, size, word count)
Re-transcribe audio files with current settings
Copy transcriptions to clipboard
Automatic file cleanup after 24 hours

Transcription History

Access via "Transcription History" command
View history of the last transcribed messages (original text only)
Configurable storage limit (0=disabled, 5-100 messages, unlimited)
Copy transcriptions to clipboard
Delete individual entries or clear all history
Persistent storage using Raycast LocalStorage

Configuration

Transcription Settings

Model: Text field for any OpenAI transcription model (default: whisper-1)
- Examples: whisper-1, gpt-4o-transcribe
Language: Auto-detect or specify language (English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Russian)
Temperature: Control transcription creativity (0, 0.2, 0.5, 0.8, 1.0)
Context File: Custom prompt file to improve accuracy for specific terms

Formatting Settings

Formatting Model: Text field for any OpenAI language model (default: gpt-4o)
- Examples: gpt-4o, gpt-4o-mini, gpt-3.5-turbo
Base URL: OpenAI-compatible API endpoint (default: https://api.openai.com/v1)
Custom Prompts: Override default formatting prompts with custom files

Paste Behavior

Paste Behavior: Choose how text is handled after processing
- Paste only, Copy and paste, Copy only, Show with action buttons

History Settings

Transcription History Limit: Configure how many transcriptions to store
- Options: Disabled (0), 5-100 messages, Unlimited (-1)
- Default: 10 messages
- Automatically saves original transcriptions to local storage

Custom Prompts

You can provide custom prompt files for better transcription accuracy and formatting:

Transcription Context File

Improves speech recognition for specific names, terms, or abbreviations:

## Prompt
Context for speech recognition:
- Company: Acme Corporation
- Names: John Smith, Sarah Johnson
- Technical terms: Kubernetes, PostgreSQL
- Abbreviations: API, CI/CD, SLA

Custom Formatting Prompts

Override default formatting with custom prompts:

## Prompt
Transform the following text into a casual team update email.
Keep it friendly and conversational, focusing on progress and next steps.

Technical Architecture

Core Components

Dictate Command (src/dictate.tsx): Main recording and transcription workflow with state management
Recording History (src/recording-history.tsx): History management interface for audio files
Transcription History (src/transcription-history.tsx): History management interface for transcribed text
Audio Recorder Hook (src/hooks/useAudioRecorder.ts): SoX integration for recording

Audio System

Uses SoX (Sound eXchange) for cross-platform recording
Records in WAV format (16kHz, mono, 16-bit) optimized for Whisper
Automatic SoX detection across common installation paths: /usr/local/bin/sox, /opt/homebrew/bin/sox, /usr/bin/sox
Audio validation (1KB minimum, 25MB maximum) and temporary file management

API Integration

OpenAI: Single provider for both transcription and text formatting
Configurable base URL supports OpenAI-compatible APIs
Robust error handling for API rate limits, authentication, and quota issues
Temperature setting affects both transcription and formatting

File Management

Temporary audio files stored in ~/Library/Application Support/com.raycast.macos/extensions/dictation-ai/temp
Automatic cleanup of files older than 24 hours
Transcription history stored persistently using Raycast LocalStorage
File validation and size limit enforcement

Clipboard Operations

Advanced clipboard management with multiple paste behaviors:

Direct paste with fallback to copy
Smart routing based on user preferences
HUD notifications for user feedback
Error handling with graceful degradation

Development

Commands

npm run dev          # Start Raycast development mode
npm run build        # Build the extension
npm run lint         # Run ESLint
npm run fix-lint     # Fix ESLint issues automatically
npm run test         # Run Jest tests
npm run test:watch   # Run Jest tests in watch mode
npm run test:coverage # Run Jest tests with coverage report
npm run publish      # Publish to Raycast Store

Project Structure

src/
├── dictate.tsx              # Main dictation command
├── recording-history.tsx    # Audio file history management
├── transcription-history.tsx # Text transcription history management
├── hooks/
│   └── useAudioRecorder.ts  # Audio recording logic
├── utils/
│   ├── audio.ts            # SoX integration & file management
│   ├── openai.ts           # OpenAI API integration
│   ├── formatters.ts       # Text formatting with OpenAI models
│   ├── clipboard.ts        # Clipboard operations with paste behaviors
│   ├── errors.ts           # Centralized error handling
│   ├── prompts.ts          # Custom prompt loading utilities
│   ├── history.ts          # Transcription history storage
│   └── time.ts             # Time/size formatting utilities
├── types.ts                # TypeScript definitions
└── constants.ts            # Configuration constants

Troubleshooting

Common Issues

SoX not installed

# Install via Homebrew
brew install sox

Recording not working

Check microphone permissions in System Preferences
Ensure SoX is installed and accessible in PATH
Try different microphone input sources
Verify audio input levels

API Errors

Verify OpenAI API key has sufficient credits
Check API key permissions for audio and chat endpoints
Ensure internet connection is stable
Try different base URL if using alternative provider

File Size Limits

Maximum audio file size: 25MB (OpenAI limit)
Minimum audio size: 1KB for validation
Files are automatically validated before transcription
Long recordings may hit size limits

Paste/Clipboard Issues

Check app permissions for clipboard access
Try different paste behavior settings
Verify target application accepts clipboard input
Use "Show with action buttons" mode for manual control

Error Messages

Sox is not installed: Install SoX using Homebrew or MacPorts
Audio file exceeds 25MB limit: Recording too long, try shorter sessions
OpenAI API key is required: Configure API key in preferences
Transcription failed: Check API key, credits, and internet connection
Audio file is too short: Ensure minimum 1KB file size
Paste Error: Target application may not accept clipboard input

Privacy & Security

Audio files are temporarily stored locally and automatically deleted after 24 hours
No persistent storage of voice recordings
Transcription history stored locally using Raycast's secure LocalStorage
API keys are stored securely in Raycast preferences
All API calls use HTTPS encryption
No telemetry or usage tracking
Local processing with external API calls only for transcription/formatting

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Update documentation as needed
Submit a pull request

Support

Changelog

Recent Updates

Transcription History: New command to view and manage history of transcribed text messages
Configurable History Limit: Set storage limit from 0 (disabled) to unlimited
Unified OpenAI Integration: Removed OpenRouter dependency, now uses only OpenAI for all operations
Flexible Model Configuration: Text fields for custom model selection
Enhanced Paste Behavior: Four distinct modes including "show with actions"
Comprehensive Keyboard Shortcuts: Full hotkey support for efficient workflow
OpenAI-Compatible APIs: Configurable base URL for alternative providers
Improved Error Handling: Better error messages and recovery options

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
metadata		metadata
src		src
tests/utils		tests/utils
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README_RUS.md		README_RUS.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
raycast-env.d.ts		raycast-env.d.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

RayCast - Dictation AI

Features

Installation

Setup

Required API Key

System Requirements

Usage

Basic Dictation

Format Modes

Paste Behavior Options

Keyboard Shortcuts

Recording History

Transcription History

Configuration

Transcription Settings

Formatting Settings

Paste Behavior

History Settings

Custom Prompts

Transcription Context File

Custom Formatting Prompts

Technical Architecture

Core Components

Audio System

API Integration

File Management

Clipboard Operations

Development

Commands

Project Structure

Troubleshooting

Common Issues

Error Messages

Privacy & Security

License

Contributing

Support

Changelog

Recent Updates

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages