jvosk

Modern desktop application for speech-to-text transcription using Vosk offline speech recognition.

Built with Java Swing and FlatLaf for a polished, cross-platform UI with multi-model support.

Download 0.1.0-SNAPSHOT

Made for macOS 🍎

Looks decent on Windows too :) 🪟

Features

Model Management

Multi-Model Support: Download and manage multiple Vosk models
Automatic Updates: Check for model updates at startup
150+ Models Available: All models from alphacephei.com/vosk/models
Easy Switching: Switch between models on the fly
Smart Downloads:
- Small models (< 500MB) for quick downloads
- Big models (> 500MB) with download confirmation
- Progress tracking for all downloads
40+ Languages: English, Chinese, Russian, French, German, Spanish, and many more
Model Manager UI:
- View all available models with details (size, language, accuracy)
- Download new models with progress bar
- Delete unused models
- Check for updates
- Filter by installed/available status

Audio Support

Multiple Formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, OPUS
Automatic Conversion: Built-in audio conversion (no ffmpeg required!)
Drag & Drop: Simply drag audio files into the app
File Browser: Standard file picker with format filtering

Transcription

Offline Processing: No internet required, privacy-first
Real-time Progress: Visual feedback during transcription
Accurate Results: Powered by Vosk speech recognition
Optional Timestamps: Add [HH:MM:SS] timestamps to each segment

Export Options

Plain Text (.txt)
Subtitle Formats (SRT, VTT)
Structured Data (JSON)
Markdown (.md)

User Interface

Modern Design: Clean, professional interface with FlatLaf
Dark Mode: System-aware dark/light theme toggle
Keyboard Shortcuts: Streamlined workflow
- Cmd/Ctrl+O - Open file
- Cmd/Ctrl+S - Save transcript
- Cmd/Ctrl+N - Clear/New
- Cmd/Ctrl+Shift+C - Copy to clipboard
- Cmd/Ctrl+Shift+M - Manage models
- Cmd/Ctrl+± - Adjust font size
Statistics: Word count, character count, WPM
Recent Files: Quick access to previously transcribed files
Audio Info: Display duration, format, sample rate
Progress Tracking: Real-time transcription progress

Quality of Life

Copy to Clipboard: One-click copy of transcription
Cancel Anytime: Stop long transcriptions mid-process
Unsaved Changes Warning: Never lose work accidentally
Persistent Preferences: Remembers your settings and selected model
Adjustable Font: Customize text size for comfort

Quick Start

Requirements

Java 17 or higher
Maven 3.6+
No additional dependencies needed!

Build & Run

# Clone the repository
git clone https://github.com/palaashatri/jvosk.git
cd jvosk

# Build the project
mvn clean package

# Run the application
mvn exec:java -Dexec.mainClass=atri.palaash.jvosk.App

First Use

Launch the app
Open Models → Manage Models... (Cmd/Ctrl+Shift+M)
Download a model for your language:
- For English: vosk-model-small-en-us-0.15 (40MB) or vosk-model-en-us-0.22 (1.8GB)
- For other languages, browse the available models
Click "Download Model" and wait for completion
Select the downloaded model and click "Use This Model"
Click "Browse Files..." or drag & drop an audio file
Wait for transcription to complete
Save or copy your transcript!

Model Management

Accessing Model Manager

Menu: Models → Manage Models...
Keyboard: Cmd/Ctrl+Shift+M

Available Models

The app provides access to 150+ models from the official Vosk repository:

Popular Languages:

English: 10+ models (US, Indian accents)
Chinese: 3 models
Russian: 4 models
French: 3 models
German: 4 models
Spanish: 2 models
Japanese: 2 models
And 40+ more languages!

Model Types:

Small Models (< 100MB): Fast, good for mobile/desktop, lightweight
Big Models (> 500MB): Higher accuracy, server-grade
Punctuation Models: Add punctuation and capitalization
Speaker ID Models: Identify different speakers

Downloading Models

Open Model Manager
Browse available models in the table
Select a model
Click "Download Model"
Wait for download and extraction (progress shown)
Model is automatically installed and ready to use

Note: Large models will show a confirmation dialog before downloading.

Switching Models

Quick Switch:

Menu: Models → Switch Model...
Select from installed models
Confirm selection

Or via Model Manager:

Open Model Manager
Select an installed model
Click "Use This Model"

Automatic Updates

On startup, the app checks for model updates
If updates are available, you'll see a notification
Updates can be downloaded through the Model Manager
Models are never auto-updated without your confirmation

Deleting Models

Open Model Manager
Select an installed model
Click "Delete Model"
Confirm deletion

Technical Details

Architecture

New Components:

VoskModel: Data class for model metadata
ModelRegistry: Parses Vosk models page and fetches model information
ModelManager: Handles downloading, installing, version checking, and loading models
ModelManagerDialog: UI for managing models
Enhanced VoskTranscriber: Supports switching between models
Updated App: Checks for model updates on startup

Model Storage:

All models stored in models/ directory
Each model in its own subdirectory
Models are standard Vosk format (can be used with other Vosk tools)

Dependencies

<dependencies>
    <!-- Core speech recognition -->
    <dependency>
        <groupId>com.alphacephei</groupId>
        <artifactId>vosk</artifactId>
        <version>0.3.38</version>
    </dependency>
    
    <!-- UI framework -->
    <dependency>
        <groupId>com.formdev</groupId>
        <artifactId>flatlaf</artifactId>
        <version>3.4.1</version>
    </dependency>
    
    <!-- Audio conversion -->
    <dependency>
        <groupId>ws.schild</groupId>
        <artifactId>jave-all-deps</artifactId>
        <version>3.5.0</version>
    </dependency>
    
    <!-- Web scraping for model registry -->
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.17.2</version>
    </dependency>
    
    <!-- JSON processing -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.17.1</version>
    </dependency>
</dependencies>

Wait for transcription to complete
Copy, save, or export your transcript

Technology Stack

Speech Recognition: Vosk
Audio Processing: JAVE2 (FFmpeg wrapper)
UI Framework: Java Swing with FlatLaf
Build Tool: Maven

License

MIT

Contributing

Contributions welcome! Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
src/main/java/atri/palaash/jvosk		src/main/java/atri/palaash/jvosk
.gitignore		.gitignore
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jvosk

Features

Model Management

Audio Support

Transcription

Export Options

User Interface

Quality of Life

Quick Start

Requirements

Build & Run

First Use

Model Management

Accessing Model Manager

Available Models

Downloading Models

Switching Models

Automatic Updates

Deleting Models

Technical Details

Architecture

Dependencies

Technology Stack

License

Contributing

About

Uh oh!

Releases 1

Uh oh!

Contributors 2

Uh oh!

Languages

palaashatri/jvosk

Folders and files

Latest commit

History

Repository files navigation

jvosk

Features

Model Management

Audio Support

Transcription

Export Options

User Interface

Quality of Life

Quick Start

Requirements

Build & Run

First Use

Model Management

Accessing Model Manager

Available Models

Downloading Models

Switching Models

Automatic Updates

Deleting Models

Technical Details

Architecture

Dependencies

Technology Stack

License

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 2

Uh oh!

Languages