🎧 audio-to-text

A free and robust backend package for transcribing audio files to text using the Web Speech API.

Features

✅ Convert audio files to text
🎤 Supports multiple languages
🧠 Uses Web Speech API inside a headless browser (via Puppeteer)
🔊 Streams audio using a virtual microphone
💾 Supports all audio file formats supported by ffmpeg (e.g., .mp3, .wav, .ogg, .m4a, etc.)
🪄 Automatically sets up required audio routing using pactl and paplay
⚙️ Works in Linux environments with PipeWire or PulseAudio

🛠 Requirements

Before installing and using this package, please ensure the following dependencies are installed and properly configured on your system:

ffmpeg — for audio format conversion and processing
ffprobe — for audio validation (comes with ffmpeg)
PipeWire — RECOMMENDED modern audio server
PulseAudio — alternative audio server (older systems)
pactl — Audio control tool
paplay — Audio playback utility
Microsoft Edge — Microsoft Edge Browser
Google Chrome or Chromium — browsers
Node.js — version 18 or higher is recommended
bun — optional, recommended for development and build tasks
Internet connection (required for browser-based speech recognition)

Install on Ubuntu/Debian:

PipeWire (Recommended - Modern Audio Server):

sudo apt update
sudo apt install ffmpeg pipewire pipewire-pulse wireplumber

PulseAudio (Alternative - Older Systems):

sudo apt update
sudo apt install ffmpeg pulseaudio-utils pulseaudio

🔐 Permissions

Make sure Node.js has permission to run pactl and paplay
Puppeteer will launch a headless browser and use your virtual audio devices

📦 Installation

To install with Bun:

bun add audio-to-text-node

Or with npm:

npm install audio-to-text-node

🧼 Cleanup

The package creates temporary folders in /tmp/audio-to-text and cleans them up automatically after use.

✨ Usage

import { transcribeFromFile } from "audio-to-text-node";

async function main() {
  const transcript = await transcribeFromFile("/path/to/audio.wav", {
    language: "en-US",
    executablePath: "/usr/bin/microsoft-edge",
    speakerDevice: "virtual_speaker",
    microphoneDevice: "virtual_microphone",
  });

  console.log(transcript);
}

main();

Tested Distributions

Distribution	Version	Status
Ubuntu	24.10	✅ Fully Tested
MacOS	-	❌ Not Supported
Windows	-	❌ Not Supported

Note: This package is designed for Linux environments.

📚 API Reference

🧠 `transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>`

🧩 Parameter	📝 Type	📖 Description	🧵 Default
`filePath`	`string`	Path to the audio file (`.wav`, `.mp3`, `.ogg`, etc.)	—
`options.language`	`string`	Language code for transcription	`'en-US'`
`options.executablePath`	`string`	Path to browser executable	Auto-detected
`options.speakerDevice`	`string`	Virtual speaker device name (PipeWire/PulseAudio)	`'virtual_speaker'`
`options.microphoneDevice`	`string`	Virtual microphone device name (PipeWire/PulseAudio)	`'virtual_microphone'`

Browser Detection Priority:

Microsoft Edge - /usr/bin/microsoft-edge
Google Chrome - /usr/bin/google-chrome
Chromium - /usr/bin/chromium-browser

🔁 Returns: Promise<string> — The transcribed text.

⚙️ How it works:

✅ Validates and splits the audio file into 5-second chunks
🎛 Sets up virtual audio devices for routing (PipeWire/PulseAudio)
🧭 Launches a headless browser and uses Web Speech API for transcription
🧹 Cleans up temporary files and restores audio routing

🎵 Supported Audio Formats

This package supports all audio formats supported by ffmpeg. For a full list, see:

FFmpeg Supported File Formats

Common formats include: .wav, .mp3, .ogg, .flac, .aac, .m4a, and more.

🌐 Supported Languages

You can use any language supported by the Web Speech API and Google Speech-to-Text. For a full list, see:

Google Speech-to-Text Supported Languages

Specify the language code (e.g., en-US, fa-IR, fr-FR, etc.) in the language option.

🛠️ Troubleshooting

Ensure all prerequisites are installed and available in your PATH (which ffmpeg, which ffprobe, which pactl, which paplay)
For best audio performance: Use PipeWire (modern) over PulseAudio (legacy)
For long audio files, ensure enough disk space in /tmp
If you get permission errors, run with appropriate user rights
For best results, use high-quality audio files (16kHz mono recommended)
Make sure your connection is stable and not interrupted during transcription
Only Linux with PipeWire or PulseAudio is supported
If browser detection fails, explicitly set executablePath to your browser location

Common Browser Paths:

# Check if browsers are installed
which microsoft-edge
which google-chrome
which chromium-browser

Audio System Check:

# Check if PipeWire is running (recommended)
systemctl --user status pipewire pipewire-pulse

# Check if PulseAudio is running (alternative)
systemctl --user status pulseaudio

# Test audio commands
which pactl paplay # Should work with both PipeWire and PulseAudio

📝 Changelog

Version 0.2.0 (Latest)

🚀 BREAKING: Switched from puppeteer to puppeteer-core for better control
✨ Added multi-browser support with automatic detection (Edge, Chrome, Chromium)
⚙️ Added executablePath option to specify custom browser location
🎵 Added PipeWire support (recommended audio server)

💬 Contributing

Pull requests and issues are welcome!
Please open issues for any bugs or feature requests. When contributing, please:

Use clear commit messages
Follow TypeScript best practices

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 audio-to-text

Features

🛠 Requirements

Install on Ubuntu/Debian:

PipeWire (Recommended - Modern Audio Server):

PulseAudio (Alternative - Older Systems):

🔐 Permissions

📦 Installation

🧼 Cleanup

✨ Usage

Tested Distributions

📚 API Reference

🧠 `transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>`

Browser Detection Priority:

⚙️ How it works:

🎵 Supported Audio Formats

🌐 Supported Languages

🛠️ Troubleshooting

Common Browser Paths:

Audio System Check:

📝 Changelog

Version 0.2.0 (Latest)

💬 Contributing

📋 License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 audio-to-text

Features

🛠 Requirements

Install on Ubuntu/Debian:

PipeWire (Recommended - Modern Audio Server):

PulseAudio (Alternative - Older Systems):

🔐 Permissions

📦 Installation

🧼 Cleanup

✨ Usage

Tested Distributions

📚 API Reference

🧠 transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>

Browser Detection Priority:

⚙️ How it works:

🎵 Supported Audio Formats

🌐 Supported Languages

🛠️ Troubleshooting

Common Browser Paths:

Audio System Check:

📝 Changelog

Version 0.2.0 (Latest)

💬 Contributing

📋 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

🧠 `transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>`