Skip to content

ErfanBahramali/audio-to-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎧 audio-to-text

A free and robust backend package for transcribing audio files to text using the Web Speech API.


Features

  • βœ… Convert audio files to text
  • 🎀 Supports multiple languages
  • 🧠 Uses Web Speech API inside a headless browser (via Puppeteer)
  • πŸ”Š Streams audio using a virtual microphone
  • πŸ’Ύ Supports all audio file formats supported by ffmpeg (e.g., .mp3, .wav, .ogg, .m4a, etc.)
  • πŸͺ„ Automatically sets up required audio routing using pactl and paplay
  • βš™οΈ Works in Linux environments with PipeWire or PulseAudio

πŸ›  Requirements

Before installing and using this package, please ensure the following dependencies are installed and properly configured on your system:

  • ffmpeg β€” for audio format conversion and processing
  • ffprobe β€” for audio validation (comes with ffmpeg)
  • PipeWire β€” RECOMMENDED modern audio server
  • PulseAudio β€” alternative audio server (older systems)
  • pactl β€” Audio control tool
  • paplay β€” Audio playback utility
  • Microsoft Edge β€” Microsoft Edge Browser
  • Google Chrome or Chromium β€” browsers
  • Node.js β€” version 18 or higher is recommended
  • bun β€” optional, recommended for development and build tasks
  • Internet connection (required for browser-based speech recognition)

Install on Ubuntu/Debian:

PipeWire (Recommended - Modern Audio Server):

sudo apt update
sudo apt install ffmpeg pipewire pipewire-pulse wireplumber

PulseAudio (Alternative - Older Systems):

sudo apt update
sudo apt install ffmpeg pulseaudio-utils pulseaudio

πŸ” Permissions

  • Make sure Node.js has permission to run pactl and paplay
  • Puppeteer will launch a headless browser and use your virtual audio devices

πŸ“¦ Installation

To install with Bun:

bun add audio-to-text-node

Or with npm:

npm install audio-to-text-node

🧼 Cleanup

The package creates temporary folders in /tmp/audio-to-text and cleans them up automatically after use.


✨ Usage

import { transcribeFromFile } from "audio-to-text-node";

async function main() {
  const transcript = await transcribeFromFile("/path/to/audio.wav", {
    language: "en-US",
    executablePath: "/usr/bin/microsoft-edge",
    speakerDevice: "virtual_speaker",
    microphoneDevice: "virtual_microphone",
  });

  console.log(transcript);
}

main();

Tested Distributions

Distribution Version Status
Ubuntu 24.10 βœ… Fully Tested
MacOS - ❌ Not Supported
Windows - ❌ Not Supported

Note: This package is designed for Linux environments.


πŸ“š API Reference

🧠 transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>

🧩 Parameter πŸ“ Type πŸ“– Description 🧡 Default
filePath string Path to the audio file (.wav, .mp3, .ogg, etc.) β€”
options.language string Language code for transcription 'en-US'
options.executablePath string Path to browser executable Auto-detected
options.speakerDevice string Virtual speaker device name (PipeWire/PulseAudio) 'virtual_speaker'
options.microphoneDevice string Virtual microphone device name (PipeWire/PulseAudio) 'virtual_microphone'

Browser Detection Priority:

  1. Microsoft Edge - /usr/bin/microsoft-edge
  2. Google Chrome - /usr/bin/google-chrome
  3. Chromium - /usr/bin/chromium-browser

πŸ” Returns: Promise<string> β€” The transcribed text.


βš™οΈ How it works:

  1. βœ… Validates and splits the audio file into 5-second chunks
  2. πŸŽ› Sets up virtual audio devices for routing (PipeWire/PulseAudio)
  3. 🧭 Launches a headless browser and uses Web Speech API for transcription
  4. 🧹 Cleans up temporary files and restores audio routing

🎡 Supported Audio Formats

This package supports all audio formats supported by ffmpeg. For a full list, see:

Common formats include: .wav, .mp3, .ogg, .flac, .aac, .m4a, and more.


🌐 Supported Languages

You can use any language supported by the Web Speech API and Google Speech-to-Text. For a full list, see:

Specify the language code (e.g., en-US, fa-IR, fr-FR, etc.) in the language option.


πŸ› οΈ Troubleshooting

  • Ensure all prerequisites are installed and available in your PATH (which ffmpeg, which ffprobe, which pactl, which paplay)
  • For best audio performance: Use PipeWire (modern) over PulseAudio (legacy)
  • For long audio files, ensure enough disk space in /tmp
  • If you get permission errors, run with appropriate user rights
  • For best results, use high-quality audio files (16kHz mono recommended)
  • Make sure your connection is stable and not interrupted during transcription
  • Only Linux with PipeWire or PulseAudio is supported
  • If browser detection fails, explicitly set executablePath to your browser location

Common Browser Paths:

# Check if browsers are installed
which microsoft-edge
which google-chrome
which chromium-browser

Audio System Check:

# Check if PipeWire is running (recommended)
systemctl --user status pipewire pipewire-pulse

# Check if PulseAudio is running (alternative)
systemctl --user status pulseaudio

# Test audio commands
which pactl paplay # Should work with both PipeWire and PulseAudio

πŸ“ Changelog

Version 0.2.0 (Latest)

  • πŸš€ BREAKING: Switched from puppeteer to puppeteer-core for better control
  • ✨ Added multi-browser support with automatic detection (Edge, Chrome, Chromium)
  • βš™οΈ Added executablePath option to specify custom browser location
  • 🎡 Added PipeWire support (recommended audio server)

πŸ’¬ Contributing

Pull requests and issues are welcome!
Please open issues for any bugs or feature requests. When contributing, please:

  • Use clear commit messages
  • Follow TypeScript best practices

πŸ“‹ License

MIT Β© 2025 ErfanBahramali


About

Convert audio files to text using Web Speech API, Puppeteer & PulseAudio.

Topics

Resources

License

Stars

Watchers

Forks

Contributors