Modern desktop application for speech-to-text transcription using Vosk offline speech recognition.
Built with Java Swing and FlatLaf for a polished, cross-platform UI with multi-model support.
- Multi-Model Support: Download and manage multiple Vosk models
- Automatic Updates: Check for model updates at startup
- 150+ Models Available: All models from alphacephei.com/vosk/models
- Easy Switching: Switch between models on the fly
- Smart Downloads:
- Small models (< 500MB) for quick downloads
- Big models (> 500MB) with download confirmation
- Progress tracking for all downloads
- 40+ Languages: English, Chinese, Russian, French, German, Spanish, and many more
- Model Manager UI:
- View all available models with details (size, language, accuracy)
- Download new models with progress bar
- Delete unused models
- Check for updates
- Filter by installed/available status
- Multiple Formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, OPUS
- Automatic Conversion: Built-in audio conversion (no ffmpeg required!)
- Drag & Drop: Simply drag audio files into the app
- File Browser: Standard file picker with format filtering
- Offline Processing: No internet required, privacy-first
- Real-time Progress: Visual feedback during transcription
- Accurate Results: Powered by Vosk speech recognition
- Optional Timestamps: Add
[HH:MM:SS]timestamps to each segment
- Plain Text (.txt)
- Subtitle Formats (SRT, VTT)
- Structured Data (JSON)
- Markdown (.md)
- Modern Design: Clean, professional interface with FlatLaf
- Dark Mode: System-aware dark/light theme toggle
- Keyboard Shortcuts: Streamlined workflow
Cmd/Ctrl+O- Open fileCmd/Ctrl+S- Save transcriptCmd/Ctrl+N- Clear/NewCmd/Ctrl+Shift+C- Copy to clipboardCmd/Ctrl+Shift+M- Manage modelsCmd/Ctrl+Β±- Adjust font size
- Statistics: Word count, character count, WPM
- Recent Files: Quick access to previously transcribed files
- Audio Info: Display duration, format, sample rate
- Progress Tracking: Real-time transcription progress
- Copy to Clipboard: One-click copy of transcription
- Cancel Anytime: Stop long transcriptions mid-process
- Unsaved Changes Warning: Never lose work accidentally
- Persistent Preferences: Remembers your settings and selected model
- Adjustable Font: Customize text size for comfort
- Java 17 or higher
- Maven 3.6+
- No additional dependencies needed!
# Clone the repository
git clone https://github.com/palaashatri/jvosk.git
cd jvosk
# Build the project
mvn clean package
# Run the application
mvn exec:java -Dexec.mainClass=atri.palaash.jvosk.App- Launch the app
- Open Models β Manage Models... (
Cmd/Ctrl+Shift+M) - Download a model for your language:
- For English:
vosk-model-small-en-us-0.15(40MB) orvosk-model-en-us-0.22(1.8GB) - For other languages, browse the available models
- For English:
- Click "Download Model" and wait for completion
- Select the downloaded model and click "Use This Model"
- Click "Browse Files..." or drag & drop an audio file
- Wait for transcription to complete
- Save or copy your transcript!
- Menu: Models β Manage Models...
- Keyboard:
Cmd/Ctrl+Shift+M
The app provides access to 150+ models from the official Vosk repository:
Popular Languages:
- English: 10+ models (US, Indian accents)
- Chinese: 3 models
- Russian: 4 models
- French: 3 models
- German: 4 models
- Spanish: 2 models
- Japanese: 2 models
- And 40+ more languages!
Model Types:
- Small Models (< 100MB): Fast, good for mobile/desktop, lightweight
- Big Models (> 500MB): Higher accuracy, server-grade
- Punctuation Models: Add punctuation and capitalization
- Speaker ID Models: Identify different speakers
- Open Model Manager
- Browse available models in the table
- Select a model
- Click "Download Model"
- Wait for download and extraction (progress shown)
- Model is automatically installed and ready to use
Note: Large models will show a confirmation dialog before downloading.
Quick Switch:
- Menu: Models β Switch Model...
- Select from installed models
- Confirm selection
Or via Model Manager:
- Open Model Manager
- Select an installed model
- Click "Use This Model"
- On startup, the app checks for model updates
- If updates are available, you'll see a notification
- Updates can be downloaded through the Model Manager
- Models are never auto-updated without your confirmation
- Open Model Manager
- Select an installed model
- Click "Delete Model"
- Confirm deletion
New Components:
VoskModel: Data class for model metadataModelRegistry: Parses Vosk models page and fetches model informationModelManager: Handles downloading, installing, version checking, and loading modelsModelManagerDialog: UI for managing models- Enhanced
VoskTranscriber: Supports switching between models - Updated
App: Checks for model updates on startup
Model Storage:
- All models stored in
models/directory - Each model in its own subdirectory
- Models are standard Vosk format (can be used with other Vosk tools)
<dependencies>
<!-- Core speech recognition -->
<dependency>
<groupId>com.alphacephei</groupId>
<artifactId>vosk</artifactId>
<version>0.3.38</version>
</dependency>
<!-- UI framework -->
<dependency>
<groupId>com.formdev</groupId>
<artifactId>flatlaf</artifactId>
<version>3.4.1</version>
</dependency>
<!-- Audio conversion -->
<dependency>
<groupId>ws.schild</groupId>
<artifactId>jave-all-deps</artifactId>
<version>3.5.0</version>
</dependency>
<!-- Web scraping for model registry -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>
<!-- JSON processing -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.17.1</version>
</dependency>
</dependencies>- Wait for transcription to complete
- Copy, save, or export your transcript
- Speech Recognition: Vosk
- Audio Processing: JAVE2 (FFmpeg wrapper)
- UI Framework: Java Swing with FlatLaf
- Build Tool: Maven
MIT
Contributions welcome! Please open an issue or submit a pull request.

