Vocal Prism - AI-Powered Audio Transcription for macOS

A stunning, modern macOS application for transcribing audio files using OpenAI's Whisper model with CoreML acceleration. Features a beautiful liquid-glass UI design with real-time transcription streaming.

✨ Features

Core Functionality

🎵 Multi-Format Support: MP3, WAV, M4A, FLAC
🚀 Real-Time Transcription: Stream transcription output as it's being processed
💾 Auto-Save: Automatically saves transcriptions as .txt files
📝 Subtitle Export: Generate .srt subtitle files with timestamps
🖱️ Drag & Drop: Simply drag audio files into the app
📁 File Picker: Browse and select files with native macOS dialog

Performance

⚡ CoreML Acceleration: Automatic GPU acceleration on Apple Silicon
🔧 Configurable Threads: Adjust CPU threads for optimal performance
🎯 Multiple Models: Support for base, small, medium, and large Whisper variants
🔄 Background Processing: Non-blocking UI during transcription

UI/UX

🎨 Glassmorphic Design: Beautiful semi-transparent panels with blur effects
🌈 Animated Gradients: Smooth, dynamic background animations
📊 Live Waveform: Real-time audio waveform visualization during processing
🌓 Dark Mode: Full support for macOS dark and light modes
📱 Responsive Layout: Adaptive interface that scales beautifully
✨ Smooth Animations: Spring-based transitions and hover effects

📋 Requirements

macOS 26.0 (Tahoe) or later
Apple Silicon (M1/M2/M3)
Xcode 15.0 or later

🚀 Installation & Setup

1. Bundle Resources

The app requires two main resources in the Resources folder:

whisper-cli executable

Your precompiled whisper-cli binary should already be in:

Vocal Prism/Resources/whisper-cli

The app will automatically make it executable on first launch using chmod +x.

Whisper Models

Place your Whisper model files in the Resources folder:

Vocal Prism/Resources/ggml-base.en.bin
Vocal Prism/Resources/ggml-small.en.bin (optional)
Vocal Prism/Resources/ggml-medium.en.bin (optional)
Vocal Prism/Resources/ggml-large.bin (optional)

Currently included:

✅ ggml-base.en.bin (English-only base model)
✅ ggml-base.en-encoder.mlmodelc (CoreML encoder)

Adding Models to Xcode

Open Vocal Prism.xcodeproj in Xcode
Right-click on the Resources folder in the Project Navigator
Select "Add Files to 'Vocal Prism'..."
Select your model files
Important: In the dialog, ensure:
- ✅ "Copy items if needed" is checked
- ✅ Target membership includes "Vocal Prism"
- ✅ "Create folder references" is selected (not groups)

2. Verify Bundle Resources

Check that resources are properly included:

Select the project in Xcode
Go to the "Vocal Prism" target
Click "Build Phases"
Expand "Copy Bundle Resources"
Verify the following are listed:
- whisper-cli
- ggml-base.en.bin
- Any other model files you added

3. Configure Build Settings

Code Signing

Select the "Vocal Prism" target
Go to "Signing & Capabilities"
Select your development team
Xcode will automatically handle provisioning

Minimum Deployment Target

Verify in Build Settings:

macOS Deployment Target: 26.0 or later

4. Build and Run

Select your Mac as the run destination
Press Cmd + R or click the Run button
The app will launch and verify all resources on first run

🎯 Usage

Basic Workflow

Launch the App
- The main screen shows a drop zone with animated gradient background
Select Audio File
- Drag and drop an audio file onto the drop zone, OR
- Click "Select File" to browse
Configure Settings (Optional)
- Click the "Settings" button in the top right
- Adjust:
  - CPU Threads: More threads = faster (but more CPU usage)
  - Model Variant: Larger models = more accurate (but slower)
  - Timestamps: Include time markers in transcription
  - SRT Export: Generate subtitle files
Start Transcription
- Click "Start Transcription"
- Watch the live waveform animation
- See transcription text appear in real-time
- Monitor progress with the animated progress bar
View & Export Results
- Transcription automatically saves to .txt next to audio file
- Click "Copy" to copy text to clipboard
- Click "Save" to manually save to a custom location
- Click "Export SRT" for subtitle files

Keyboard Shortcuts

Esc - Close settings panel
Standard macOS text selection works in transcription view

🛠️ Technical Architecture

Project Structure

Vocal Prism/
├── Vocal_PrismApp.swift       # App entry point, window configuration
├── ContentView.swift           # Main UI with state management
├── WhisperEngine.swift         # Transcription engine & process management
├── GlassmorphicViews.swift    # Reusable glass UI components
└── Resources/
    ├── whisper-cli             # Precompiled Whisper executable
    └── ggml-*.bin              # Whisper model files

Key Components

WhisperEngine

Manages whisper-cli process lifecycle
Streams stdout/stderr in real-time
Publishes progress and transcription updates
Handles CoreML acceleration flags
Auto-saves transcriptions

GlassmorphicViews

GlassBackgroundModifier: Frosted glass effect
AnimatedGradientBackground: Dynamic gradient animations
GlassButton: Interactive glassmorphic buttons
GlassProgressBar: Animated progress indicator
WaveformView: Real-time audio visualization
DropZoneView: Drag-and-drop file receiver

ContentView

State management for transcription workflow
File handling (picker & drag-drop)
Settings panel
Real-time transcription display with auto-scroll

CoreML Acceleration

The app automatically enables CoreML acceleration by passing the -ml 1 flag to whisper-cli. This leverages:

Apple Neural Engine (ANE)
GPU Metal acceleration
Optimized inference on Apple Silicon

Real-Time Streaming

Transcription output is captured in real-time using:

outputHandle.readabilityHandler = { handle in
    let data = handle.availableData
    if data.count > 0, let output = String(data: data, encoding: .utf8) {
        // Update UI immediately
    }
}

🎨 UI Customization

Glassmorphic Effect

The glass effect is achieved through layered blur and opacity:

.background(.ultraThinMaterial)
.overlay(gradient border)
.shadow(...)

Animations

All animations use spring physics for natural motion:

.animation(.spring(response: 0.5, dampingFraction: 0.8))

Color Schemes

The app adapts to system appearance:

Light Mode: Soft blue-purple gradients
Dark Mode: Deep blue-purple gradients

🔧 Advanced Configuration

Adding Custom Models

Download Whisper models from official source
Place in Resources/ folder
Name following pattern: ggml-{size}.{language}.bin
Update WhisperEngine.ModelVariant enum if needed

Customizing whisper-cli Flags

Edit WhisperEngine.transcribe() method to add flags:

var arguments = [
    "-m", modelPath,
    "-f", audioFile.path,
    "-t", String(options.threads),
    "--custom-flag"  // Add here
]

Available flags:

-l : Language (e.g., "en", "es", "fr")
-tr : Translate to English
-nt : No timestamps
-ml : CoreML acceleration (0 or 1)
-otxt, -osrt, -ovtt : Output formats

Performance Tuning

For fastest transcription:

Use base.en model
Set threads to number of performance cores (4-8)
Enable CoreML acceleration

For best accuracy:

Use large model
Increase threads
Enable timestamps for better context

🐛 Troubleshooting

"whisper-cli not found" Error

Solution: Ensure Resources are properly added to target

Check Build Phases → Copy Bundle Resources
Verify whisper-cli is listed
Clean build folder (Cmd + Shift + K)
Rebuild

"Permission Denied" Error

Solution: App automatically handles this, but if it persists:

chmod +x "Vocal Prism/Resources/whisper-cli"

Slow Transcription

Solutions:

Increase thread count in settings
Use smaller model (base vs large)
Close other CPU-intensive apps
Ensure CoreML is enabled

No Real-Time Output

Reason: Some whisper-cli versions buffer output Solution: Update to latest whisper.cpp build with --no-prints support

App Won't Launch After Build

Check:

Code signing is configured
All resources are bundled
Console for specific errors: Window → Show Console

📦 Distribution

Creating Release Build

Select "Any Mac" as destination
Product → Archive
Distribute App → Direct Distribution
Export and notarize for distribution

App Bundle Structure

Vocal Prism.app/
└── Contents/
    ├── MacOS/
    │   └── Vocal Prism (executable)
    └── Resources/
        ├── whisper-cli
        ├── ggml-base.en.bin
        └── ggml-base.en-encoder.mlmodelc/

Notarization for Distribution

For distribution outside Mac App Store:

xcrun notarytool submit "Vocal Prism.app" --wait
xcrun stapler staple "Vocal Prism.app"

🔮 Future Enhancements

📝 License

This project is provided as-is for educational and personal use.

Third-Party Components

whisper.cpp: MIT License
Whisper Models: MIT License (OpenAI)

🙏 Acknowledgments

OpenAI for the Whisper model
whisper.cpp community for the excellent C++ implementation
Apple for SwiftUI and CoreML frameworks

💬 Support

For issues or questions:

Check the Troubleshooting section
Review whisper.cpp documentation
Verify all resources are properly bundled

Built with ❤️ using SwiftUI and Whisper

Enjoy transcribing with style! 🎙️✨

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Vocal Prism.xcodeproj		Vocal Prism.xcodeproj
Vocal Prism		Vocal Prism
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

aarush67/Vocal-Prism

Folders and files

Latest commit

History

Repository files navigation