A free and robust backend package for transcribing audio files to text using the Web Speech API.
- β Convert audio files to text
- π€ Supports multiple languages
- π§ Uses Web Speech API inside a headless browser (via Puppeteer)
- π Streams audio using a virtual microphone
- πΎ Supports all audio file formats supported by ffmpeg (e.g., .mp3, .wav, .ogg, .m4a, etc.)
- πͺ Automatically sets up required audio routing using
pactlandpaplay - βοΈ Works in Linux environments with PipeWire or PulseAudio
Before installing and using this package, please ensure the following dependencies are installed and properly configured on your system:
- ffmpeg β for audio format conversion and processing
- ffprobe β for audio validation (comes with ffmpeg)
- PipeWire β RECOMMENDED modern audio server
- PulseAudio β alternative audio server (older systems)
- pactl β Audio control tool
- paplay β Audio playback utility
- Microsoft Edge β Microsoft Edge Browser
- Google Chrome or Chromium β browsers
- Node.js β version 18 or higher is recommended
- bun β optional, recommended for development and build tasks
- Internet connection (required for browser-based speech recognition)
sudo apt update
sudo apt install ffmpeg pipewire pipewire-pulse wireplumbersudo apt update
sudo apt install ffmpeg pulseaudio-utils pulseaudio- Make sure Node.js has permission to run
pactlandpaplay - Puppeteer will launch a headless browser and use your virtual audio devices
To install with Bun:
bun add audio-to-text-nodeOr with npm:
npm install audio-to-text-nodeThe package creates temporary folders in /tmp/audio-to-text and cleans them up automatically after use.
import { transcribeFromFile } from "audio-to-text-node";
async function main() {
const transcript = await transcribeFromFile("/path/to/audio.wav", {
language: "en-US",
executablePath: "/usr/bin/microsoft-edge",
speakerDevice: "virtual_speaker",
microphoneDevice: "virtual_microphone",
});
console.log(transcript);
}
main();| Distribution | Version | Status |
|---|---|---|
| Ubuntu | 24.10 | β Fully Tested |
| MacOS | - | β Not Supported |
| Windows | - | β Not Supported |
Note: This package is designed for Linux environments.
π§ transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>
| π§© Parameter | π Type | π Description | π§΅ Default |
|---|---|---|---|
filePath |
string |
Path to the audio file (.wav, .mp3, .ogg, etc.) |
β |
options.language |
string |
Language code for transcription | 'en-US' |
options.executablePath |
string |
Path to browser executable | Auto-detected |
options.speakerDevice |
string |
Virtual speaker device name (PipeWire/PulseAudio) | 'virtual_speaker' |
options.microphoneDevice |
string |
Virtual microphone device name (PipeWire/PulseAudio) | 'virtual_microphone' |
- Microsoft Edge -
/usr/bin/microsoft-edge - Google Chrome -
/usr/bin/google-chrome - Chromium -
/usr/bin/chromium-browser
π Returns: Promise<string> β The transcribed text.
- β Validates and splits the audio file into 5-second chunks
- π Sets up virtual audio devices for routing (PipeWire/PulseAudio)
- π§ Launches a headless browser and uses Web Speech API for transcription
- π§Ή Cleans up temporary files and restores audio routing
This package supports all audio formats supported by ffmpeg. For a full list, see:
Common formats include: .wav, .mp3, .ogg, .flac, .aac, .m4a, and more.
You can use any language supported by the Web Speech API and Google Speech-to-Text. For a full list, see:
Specify the language code (e.g., en-US, fa-IR, fr-FR, etc.) in the language option.
- Ensure all prerequisites are installed and available in your PATH (
which ffmpeg,which ffprobe,which pactl,which paplay) - For best audio performance: Use PipeWire (modern) over PulseAudio (legacy)
- For long audio files, ensure enough disk space in
/tmp - If you get permission errors, run with appropriate user rights
- For best results, use high-quality audio files (16kHz mono recommended)
- Make sure your connection is stable and not interrupted during transcription
- Only Linux with PipeWire or PulseAudio is supported
- If browser detection fails, explicitly set
executablePathto your browser location
# Check if browsers are installed
which microsoft-edge
which google-chrome
which chromium-browser# Check if PipeWire is running (recommended)
systemctl --user status pipewire pipewire-pulse
# Check if PulseAudio is running (alternative)
systemctl --user status pulseaudio
# Test audio commands
which pactl paplay # Should work with both PipeWire and PulseAudio- π BREAKING: Switched from
puppeteertopuppeteer-corefor better control - β¨ Added multi-browser support with automatic detection (Edge, Chrome, Chromium)
- βοΈ Added
executablePathoption to specify custom browser location - π΅ Added PipeWire support (recommended audio server)
Pull requests and issues are welcome!
Please open issues for any bugs or feature requests.
When contributing, please:
- Use clear commit messages
- Follow TypeScript best practices
MIT Β© 2025 ErfanBahramali