sttapp

System-wide Speech-to-Text Desktop Application

A fast, lightweight desktop app that transcribes speech to text using OpenAI-compatible APIs. Features global hotkey support, automatic clipboard copying, and seamless paste-to-active-window functionality.

Installation

Pre-built Binaries

Download the latest release for your platform from GitHub Releases:

Windows: .msi or .exe installer
macOS: .dmg or .app
Linux: .AppImage, .deb, or .rpm

Build from Source

Prerequisites

Rust (latest stable)
Deno runtime

Build Steps

git clone https://github.com/lmtr0/sttapp.git
cd sttapp
deno task build
deno task tauri build

Binaries will be available in src-tauri/target/release/bundle/.

Configuration

First-Time Setup

On first launch, the settings window opens automatically. Configure your API credentials to get started.

API Configuration

The app requires an OpenAI-compatible API:

API Key: Your API key (e.g., sk-... for OpenAI)
Base URL: API endpoint (default: https://api.openai.com/v1)
Model: Speech recognition model

Preset Models

whisper-1 - OpenAI Whisper
whisper-large-v3 - Groq Whisper v3
whisper-large-v3-turbo - Groq Whisper v3 Turbo

Custom models can be configured for other OpenAI-compatible APIs.

Environment Variables

Alternatively, configure via environment variables:

export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1"  # optional
export OPENAI_MODEL="whisper-1"  # optional

Supported Providers

OpenAI - Default, uses whisper-1
Groq - Fast inference, uses whisper-large-v3 variants
Any OpenAI-compatible API (custom base URL)

Test Connection

Use the "Test Connection" button in settings to verify your API credentials before saving.

Usage

Quick Start

Launch sttapp
Configure API settings
Press F8 to start recording
Press F8 to stop and transcribe
Transcript auto-copies to clipboard and pastes into active window

Keyboard Shortcuts

Shortcut	Action
F8	Start/stop recording, normal paste
Shift+F8	Start/stop recording, paste as plain text

System Tray

Right-click the tray icon to access:

Start Recording - Begin audio capture
Stop Recording - End capture and transcribe
Settings - Open configuration window
Quit - Exit application

The tray icon changes state during recording.

Workflow

Press F8 (or use tray menu) to start recording
Speak into your microphone
Press F8 again to stop
Audio is encoded to FLAC and sent to the API
Transcript is copied to clipboard
Transcript is automatically pasted into the active window
Main window auto-hides (if not focused)

Window Behavior

Positioned at bottom-center of screen
Auto-hides after successful transcription
Can be closed at any time without affecting recording

Development

Prerequisites

Rust (latest stable)
Deno runtime
Platform-specific build tools:
- Windows: Microsoft Visual Studio C++ Build Tools
- macOS: Xcode Command Line Tools (xcode-select --install)
- Linux: build-essential, libgtk-3-dev, libwebkit2gtk-4.0-dev

Development Commands

# Start frontend dev server
deno task dev

# Run Tauri in dev mode with hot reload
deno task tauri dev

# Type check
deno task check

# Build frontend
deno task build

# Build production binaries
deno task tauri build

Project Structure

sttapp/
├── src/                    # SvelteKit frontend
│   ├── routes/            # SvelteKit routes
│   │   ├── +page.svelte   # Main recording window
│   │   └── settings/      # Settings page
│   └── lib/               # Shared utilities
│       └── config.ts      # Configuration management
├── src-tauri/             # Rust backend
│   ├── src/
│   │   ├── main.rs        # Entry point
│   │   └── lib.rs         # Core app logic
│   └── tauri.conf.json    # Tauri configuration
└── static/                # Static assets
    └── audio-processor.worklet.js  # Audio capture worklet

Key Components

Audio Capture: Uses AudioWorklet for efficient 16kHz mono capture
FLAC Encoding: Client-side encoding via libflac.js
API Integration: OpenAI-compatible transcription endpoint
Global Shortcuts: F8 and Shift+F8 via Tauri global-shortcut plugin
Auto-paste: Uses enigo for cross-platform keyboard simulation

Platform Notes

Linux/Wayland: Microphone permission handled programmatically for WebKitGTK
Linux/X11: Requires DISPLAY environment variable
Windows: No console window in release builds

Contributing

Contributions are welcome! Here's how to get started:

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Test thoroughly using deno task tauri dev
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Frontend: TypeScript with Svelte 5 runes ($state, $derived)
Backend: Rust with standard Rust formatting
Commits: Clear, descriptive commit messages

Reporting Issues

Report bugs or request features via GitHub Issues.

Please include:

Operating system and version
Steps to reproduce
Expected vs actual behavior
Screenshots (if applicable)

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.agent/skills		.agent/skills
.github/workflows		.github/workflows
.vscode		.vscode
src-tauri		src-tauri
src		src
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
skills-lock.json		skills-lock.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.js		vite.config.js
yarn.lock		yarn.lock

Folders and files

Latest commit

History

Repository files navigation

sttapp

Installation

Pre-built Binaries

Build from Source

Prerequisites

Build Steps

Configuration

First-Time Setup

API Configuration

Preset Models

Environment Variables

Supported Providers

Test Connection

Usage

Quick Start

Keyboard Shortcuts

System Tray

Workflow

Window Behavior

Development

Prerequisites

Development Commands

Project Structure

Key Components

Platform Notes

Contributing

Development Workflow

Code Style

Reporting Issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages