Skip to content

Vocal Prism — Privacy-first, local AI audio transcription for macOS (Whisper → CoreML, Apple Silicon‑optimized).

License

Notifications You must be signed in to change notification settings

aarush67/Vocal-Prism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vocal Prism - AI-Powered Audio Transcription for macOS

A stunning, modern macOS application for transcribing audio files using OpenAI's Whisper model with CoreML acceleration. Features a beautiful liquid-glass UI design with real-time transcription streaming.

macOS Swift SwiftUI

✨ Features

Core Functionality

  • 🎵 Multi-Format Support: MP3, WAV, M4A, FLAC
  • 🚀 Real-Time Transcription: Stream transcription output as it's being processed
  • 💾 Auto-Save: Automatically saves transcriptions as .txt files
  • 📝 Subtitle Export: Generate .srt subtitle files with timestamps
  • 🖱️ Drag & Drop: Simply drag audio files into the app
  • 📁 File Picker: Browse and select files with native macOS dialog

Performance

  • CoreML Acceleration: Automatic GPU acceleration on Apple Silicon
  • 🔧 Configurable Threads: Adjust CPU threads for optimal performance
  • 🎯 Multiple Models: Support for base, small, medium, and large Whisper variants
  • 🔄 Background Processing: Non-blocking UI during transcription

UI/UX

  • 🎨 Glassmorphic Design: Beautiful semi-transparent panels with blur effects
  • 🌈 Animated Gradients: Smooth, dynamic background animations
  • 📊 Live Waveform: Real-time audio waveform visualization during processing
  • 🌓 Dark Mode: Full support for macOS dark and light modes
  • 📱 Responsive Layout: Adaptive interface that scales beautifully
  • Smooth Animations: Spring-based transitions and hover effects

📋 Requirements

  • macOS 26.0 (Tahoe) or later
  • Apple Silicon (M1/M2/M3)
  • Xcode 15.0 or later

🚀 Installation & Setup

1. Bundle Resources

The app requires two main resources in the Resources folder:

whisper-cli executable

Your precompiled whisper-cli binary should already be in:

Vocal Prism/Resources/whisper-cli

The app will automatically make it executable on first launch using chmod +x.

Whisper Models

Place your Whisper model files in the Resources folder:

Vocal Prism/Resources/ggml-base.en.bin
Vocal Prism/Resources/ggml-small.en.bin (optional)
Vocal Prism/Resources/ggml-medium.en.bin (optional)
Vocal Prism/Resources/ggml-large.bin (optional)

Currently included:

  • ggml-base.en.bin (English-only base model)
  • ggml-base.en-encoder.mlmodelc (CoreML encoder)

Adding Models to Xcode

  1. Open Vocal Prism.xcodeproj in Xcode
  2. Right-click on the Resources folder in the Project Navigator
  3. Select "Add Files to 'Vocal Prism'..."
  4. Select your model files
  5. Important: In the dialog, ensure:
    • ✅ "Copy items if needed" is checked
    • ✅ Target membership includes "Vocal Prism"
    • ✅ "Create folder references" is selected (not groups)

2. Verify Bundle Resources

Check that resources are properly included:

  1. Select the project in Xcode
  2. Go to the "Vocal Prism" target
  3. Click "Build Phases"
  4. Expand "Copy Bundle Resources"
  5. Verify the following are listed:
    • whisper-cli
    • ggml-base.en.bin
    • Any other model files you added

3. Configure Build Settings

Code Signing

  1. Select the "Vocal Prism" target
  2. Go to "Signing & Capabilities"
  3. Select your development team
  4. Xcode will automatically handle provisioning

Minimum Deployment Target

Verify in Build Settings:

  • macOS Deployment Target: 26.0 or later

4. Build and Run

  1. Select your Mac as the run destination
  2. Press Cmd + R or click the Run button
  3. The app will launch and verify all resources on first run

🎯 Usage

Basic Workflow

  1. Launch the App

    • The main screen shows a drop zone with animated gradient background
  2. Select Audio File

    • Drag and drop an audio file onto the drop zone, OR
    • Click "Select File" to browse
  3. Configure Settings (Optional)

    • Click the "Settings" button in the top right
    • Adjust:
      • CPU Threads: More threads = faster (but more CPU usage)
      • Model Variant: Larger models = more accurate (but slower)
      • Timestamps: Include time markers in transcription
      • SRT Export: Generate subtitle files
  4. Start Transcription

    • Click "Start Transcription"
    • Watch the live waveform animation
    • See transcription text appear in real-time
    • Monitor progress with the animated progress bar
  5. View & Export Results

    • Transcription automatically saves to .txt next to audio file
    • Click "Copy" to copy text to clipboard
    • Click "Save" to manually save to a custom location
    • Click "Export SRT" for subtitle files

Keyboard Shortcuts

  • Esc - Close settings panel
  • Standard macOS text selection works in transcription view

🛠️ Technical Architecture

Project Structure

Vocal Prism/
├── Vocal_PrismApp.swift       # App entry point, window configuration
├── ContentView.swift           # Main UI with state management
├── WhisperEngine.swift         # Transcription engine & process management
├── GlassmorphicViews.swift    # Reusable glass UI components
└── Resources/
    ├── whisper-cli             # Precompiled Whisper executable
    └── ggml-*.bin              # Whisper model files

Key Components

WhisperEngine

  • Manages whisper-cli process lifecycle
  • Streams stdout/stderr in real-time
  • Publishes progress and transcription updates
  • Handles CoreML acceleration flags
  • Auto-saves transcriptions

GlassmorphicViews

  • GlassBackgroundModifier: Frosted glass effect
  • AnimatedGradientBackground: Dynamic gradient animations
  • GlassButton: Interactive glassmorphic buttons
  • GlassProgressBar: Animated progress indicator
  • WaveformView: Real-time audio visualization
  • DropZoneView: Drag-and-drop file receiver

ContentView

  • State management for transcription workflow
  • File handling (picker & drag-drop)
  • Settings panel
  • Real-time transcription display with auto-scroll

CoreML Acceleration

The app automatically enables CoreML acceleration by passing the -ml 1 flag to whisper-cli. This leverages:

  • Apple Neural Engine (ANE)
  • GPU Metal acceleration
  • Optimized inference on Apple Silicon

Real-Time Streaming

Transcription output is captured in real-time using:

outputHandle.readabilityHandler = { handle in
    let data = handle.availableData
    if data.count > 0, let output = String(data: data, encoding: .utf8) {
        // Update UI immediately
    }
}

🎨 UI Customization

Glassmorphic Effect

The glass effect is achieved through layered blur and opacity:

.background(.ultraThinMaterial)
.overlay(gradient border)
.shadow(...)

Animations

All animations use spring physics for natural motion:

.animation(.spring(response: 0.5, dampingFraction: 0.8))

Color Schemes

The app adapts to system appearance:

  • Light Mode: Soft blue-purple gradients
  • Dark Mode: Deep blue-purple gradients

🔧 Advanced Configuration

Adding Custom Models

  1. Download Whisper models from official source
  2. Place in Resources/ folder
  3. Name following pattern: ggml-{size}.{language}.bin
  4. Update WhisperEngine.ModelVariant enum if needed

Customizing whisper-cli Flags

Edit WhisperEngine.transcribe() method to add flags:

var arguments = [
    "-m", modelPath,
    "-f", audioFile.path,
    "-t", String(options.threads),
    "--custom-flag"  // Add here
]

Available flags:

  • -l : Language (e.g., "en", "es", "fr")
  • -tr : Translate to English
  • -nt : No timestamps
  • -ml : CoreML acceleration (0 or 1)
  • -otxt, -osrt, -ovtt : Output formats

Performance Tuning

For fastest transcription:

  • Use base.en model
  • Set threads to number of performance cores (4-8)
  • Enable CoreML acceleration

For best accuracy:

  • Use large model
  • Increase threads
  • Enable timestamps for better context

🐛 Troubleshooting

"whisper-cli not found" Error

Solution: Ensure Resources are properly added to target

  1. Check Build Phases → Copy Bundle Resources
  2. Verify whisper-cli is listed
  3. Clean build folder (Cmd + Shift + K)
  4. Rebuild

"Permission Denied" Error

Solution: App automatically handles this, but if it persists:

chmod +x "Vocal Prism/Resources/whisper-cli"

Slow Transcription

Solutions:

  • Increase thread count in settings
  • Use smaller model (base vs large)
  • Close other CPU-intensive apps
  • Ensure CoreML is enabled

No Real-Time Output

Reason: Some whisper-cli versions buffer output Solution: Update to latest whisper.cpp build with --no-prints support

App Won't Launch After Build

Check:

  1. Code signing is configured
  2. All resources are bundled
  3. Console for specific errors: Window → Show Console

📦 Distribution

Creating Release Build

  1. Select "Any Mac" as destination
  2. Product → Archive
  3. Distribute App → Direct Distribution
  4. Export and notarize for distribution

App Bundle Structure

Vocal Prism.app/
└── Contents/
    ├── MacOS/
    │   └── Vocal Prism (executable)
    └── Resources/
        ├── whisper-cli
        ├── ggml-base.en.bin
        └── ggml-base.en-encoder.mlmodelc/

Notarization for Distribution

For distribution outside Mac App Store:

xcrun notarytool submit "Vocal Prism.app" --wait
xcrun stapler staple "Vocal Prism.app"

🔮 Future Enhancements

  • Audio playback within app
  • Multi-file batch processing
  • Live microphone transcription
  • Translation to other languages
  • Speaker diarization
  • Custom vocabulary/prompts
  • Cloud sync for transcriptions
  • Keyboard shortcuts for common actions

📝 License

This project is provided as-is for educational and personal use.

Third-Party Components

  • whisper.cpp: MIT License
  • Whisper Models: MIT License (OpenAI)

🙏 Acknowledgments

  • OpenAI for the Whisper model
  • whisper.cpp community for the excellent C++ implementation
  • Apple for SwiftUI and CoreML frameworks

💬 Support

For issues or questions:

  1. Check the Troubleshooting section
  2. Review whisper.cpp documentation
  3. Verify all resources are properly bundled

Built with ❤️ using SwiftUI and Whisper

Enjoy transcribing with style! 🎙️✨

About

Vocal Prism — Privacy-first, local AI audio transcription for macOS (Whisper → CoreML, Apple Silicon‑optimized).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages