A stunning, modern macOS application for transcribing audio files using OpenAI's Whisper model with CoreML acceleration. Features a beautiful liquid-glass UI design with real-time transcription streaming.
- 🎵 Multi-Format Support: MP3, WAV, M4A, FLAC
- 🚀 Real-Time Transcription: Stream transcription output as it's being processed
- 💾 Auto-Save: Automatically saves transcriptions as .txt files
- 📝 Subtitle Export: Generate .srt subtitle files with timestamps
- 🖱️ Drag & Drop: Simply drag audio files into the app
- 📁 File Picker: Browse and select files with native macOS dialog
- ⚡ CoreML Acceleration: Automatic GPU acceleration on Apple Silicon
- 🔧 Configurable Threads: Adjust CPU threads for optimal performance
- 🎯 Multiple Models: Support for base, small, medium, and large Whisper variants
- 🔄 Background Processing: Non-blocking UI during transcription
- 🎨 Glassmorphic Design: Beautiful semi-transparent panels with blur effects
- 🌈 Animated Gradients: Smooth, dynamic background animations
- 📊 Live Waveform: Real-time audio waveform visualization during processing
- 🌓 Dark Mode: Full support for macOS dark and light modes
- 📱 Responsive Layout: Adaptive interface that scales beautifully
- ✨ Smooth Animations: Spring-based transitions and hover effects
- macOS 26.0 (Tahoe) or later
- Apple Silicon (M1/M2/M3)
- Xcode 15.0 or later
The app requires two main resources in the Resources folder:
Your precompiled whisper-cli binary should already be in:
Vocal Prism/Resources/whisper-cli
The app will automatically make it executable on first launch using chmod +x.
Place your Whisper model files in the Resources folder:
Vocal Prism/Resources/ggml-base.en.bin
Vocal Prism/Resources/ggml-small.en.bin (optional)
Vocal Prism/Resources/ggml-medium.en.bin (optional)
Vocal Prism/Resources/ggml-large.bin (optional)
Currently included:
- ✅
ggml-base.en.bin(English-only base model) - ✅
ggml-base.en-encoder.mlmodelc(CoreML encoder)
- Open
Vocal Prism.xcodeprojin Xcode - Right-click on the
Resourcesfolder in the Project Navigator - Select "Add Files to 'Vocal Prism'..."
- Select your model files
- Important: In the dialog, ensure:
- ✅ "Copy items if needed" is checked
- ✅ Target membership includes "Vocal Prism"
- ✅ "Create folder references" is selected (not groups)
Check that resources are properly included:
- Select the project in Xcode
- Go to the "Vocal Prism" target
- Click "Build Phases"
- Expand "Copy Bundle Resources"
- Verify the following are listed:
whisper-cliggml-base.en.bin- Any other model files you added
- Select the "Vocal Prism" target
- Go to "Signing & Capabilities"
- Select your development team
- Xcode will automatically handle provisioning
Verify in Build Settings:
- macOS Deployment Target:
26.0or later
- Select your Mac as the run destination
- Press
Cmd + Ror click the Run button - The app will launch and verify all resources on first run
-
Launch the App
- The main screen shows a drop zone with animated gradient background
-
Select Audio File
- Drag and drop an audio file onto the drop zone, OR
- Click "Select File" to browse
-
Configure Settings (Optional)
- Click the "Settings" button in the top right
- Adjust:
- CPU Threads: More threads = faster (but more CPU usage)
- Model Variant: Larger models = more accurate (but slower)
- Timestamps: Include time markers in transcription
- SRT Export: Generate subtitle files
-
Start Transcription
- Click "Start Transcription"
- Watch the live waveform animation
- See transcription text appear in real-time
- Monitor progress with the animated progress bar
-
View & Export Results
- Transcription automatically saves to
.txtnext to audio file - Click "Copy" to copy text to clipboard
- Click "Save" to manually save to a custom location
- Click "Export SRT" for subtitle files
- Transcription automatically saves to
Esc- Close settings panel- Standard macOS text selection works in transcription view
Vocal Prism/
├── Vocal_PrismApp.swift # App entry point, window configuration
├── ContentView.swift # Main UI with state management
├── WhisperEngine.swift # Transcription engine & process management
├── GlassmorphicViews.swift # Reusable glass UI components
└── Resources/
├── whisper-cli # Precompiled Whisper executable
└── ggml-*.bin # Whisper model files
- Manages
whisper-cliprocess lifecycle - Streams stdout/stderr in real-time
- Publishes progress and transcription updates
- Handles CoreML acceleration flags
- Auto-saves transcriptions
GlassBackgroundModifier: Frosted glass effectAnimatedGradientBackground: Dynamic gradient animationsGlassButton: Interactive glassmorphic buttonsGlassProgressBar: Animated progress indicatorWaveformView: Real-time audio visualizationDropZoneView: Drag-and-drop file receiver
- State management for transcription workflow
- File handling (picker & drag-drop)
- Settings panel
- Real-time transcription display with auto-scroll
The app automatically enables CoreML acceleration by passing the -ml 1 flag to whisper-cli. This leverages:
- Apple Neural Engine (ANE)
- GPU Metal acceleration
- Optimized inference on Apple Silicon
Transcription output is captured in real-time using:
outputHandle.readabilityHandler = { handle in
let data = handle.availableData
if data.count > 0, let output = String(data: data, encoding: .utf8) {
// Update UI immediately
}
}The glass effect is achieved through layered blur and opacity:
.background(.ultraThinMaterial)
.overlay(gradient border)
.shadow(...)All animations use spring physics for natural motion:
.animation(.spring(response: 0.5, dampingFraction: 0.8))The app adapts to system appearance:
- Light Mode: Soft blue-purple gradients
- Dark Mode: Deep blue-purple gradients
- Download Whisper models from official source
- Place in
Resources/folder - Name following pattern:
ggml-{size}.{language}.bin - Update
WhisperEngine.ModelVariantenum if needed
Edit WhisperEngine.transcribe() method to add flags:
var arguments = [
"-m", modelPath,
"-f", audioFile.path,
"-t", String(options.threads),
"--custom-flag" // Add here
]Available flags:
-l: Language (e.g., "en", "es", "fr")-tr: Translate to English-nt: No timestamps-ml: CoreML acceleration (0 or 1)-otxt,-osrt,-ovtt: Output formats
For fastest transcription:
- Use
base.enmodel - Set threads to number of performance cores (4-8)
- Enable CoreML acceleration
For best accuracy:
- Use
largemodel - Increase threads
- Enable timestamps for better context
Solution: Ensure Resources are properly added to target
- Check Build Phases → Copy Bundle Resources
- Verify whisper-cli is listed
- Clean build folder (
Cmd + Shift + K) - Rebuild
Solution: App automatically handles this, but if it persists:
chmod +x "Vocal Prism/Resources/whisper-cli"Solutions:
- Increase thread count in settings
- Use smaller model (base vs large)
- Close other CPU-intensive apps
- Ensure CoreML is enabled
Reason: Some whisper-cli versions buffer output
Solution: Update to latest whisper.cpp build with --no-prints support
Check:
- Code signing is configured
- All resources are bundled
- Console for specific errors: Window → Show Console
- Select "Any Mac" as destination
- Product → Archive
- Distribute App → Direct Distribution
- Export and notarize for distribution
Vocal Prism.app/
└── Contents/
├── MacOS/
│ └── Vocal Prism (executable)
└── Resources/
├── whisper-cli
├── ggml-base.en.bin
└── ggml-base.en-encoder.mlmodelc/
For distribution outside Mac App Store:
xcrun notarytool submit "Vocal Prism.app" --wait
xcrun stapler staple "Vocal Prism.app"- Audio playback within app
- Multi-file batch processing
- Live microphone transcription
- Translation to other languages
- Speaker diarization
- Custom vocabulary/prompts
- Cloud sync for transcriptions
- Keyboard shortcuts for common actions
This project is provided as-is for educational and personal use.
- whisper.cpp: MIT License
- Whisper Models: MIT License (OpenAI)
- OpenAI for the Whisper model
- whisper.cpp community for the excellent C++ implementation
- Apple for SwiftUI and CoreML frameworks
For issues or questions:
- Check the Troubleshooting section
- Review whisper.cpp documentation
- Verify all resources are properly bundled
Built with ❤️ using SwiftUI and Whisper
Enjoy transcribing with style! 🎙️✨