AI Audio Lab

Everything Audio: From real-time AI transcription to additive synthesis and music analysis.

This repository is a comprehensive playground for audio experiments, exploring the intersection of modern AI and classic sound synthesis.

🧪 Experiments Overview

🎹 Additive Synthesis

Located in /experiments/additive_synth

A series of educational scripts exploring how to build complex waveforms from scratch using mathematics and pure sine waves.

Wave Types & Aliasing: Demonstrates the difference between mathematical "perfect" waves (which can sound harsh/aliased) and band-limited waves built via additive synthesis.
Envelopes: Implementation of ADSR (Attack, Decay, Sustain, Release) and other amplitude envelopes to shape sound.
Noise Generation: Custom implementations for generating White, Pink, and Brown noise.
Intervals: Exploration of musical intervals and tuning.

🗣️ Text-to-Speech (TTS)

Located in /experiments/text_to_speech

Experiments with modern, open-source TTS models for generating high-quality speech.

Coqui TTS: Scripts for generating speech using VITS and XTTS v2 models.
Kokoro 82M: Experiments with this lightweight, high-quality model, including voice blending and weighted blending techniques.

🎼 Music Analysis (Basic Pitch)

Located in /experiments/basic-pitch

Audio-to-MIDI conversion experiments using Spotify's Basic Pitch.

Inference: converting raw audio files into MIDI sequences.
Playback: Scripts to play back the analyzed results.
Comparison: Includes references to original audio and the resulting MIDI analysis for quality comparison.

🎤 Real-time Transcription

Located in /server and /client

The project's original core functionality.

Browser-to-Server Streaming: WebSocket-based audio capture from the browser.
gRPC Backend: Python server handling the audio stream and performing transcription using OpenAI's Whisper model.
Bidirectional Streaming: Real-time feedback loop sending transcription results back to the client as they happen.

🎙️ Featured: gRPC Streaming Audio Transcription

The original core of this project demonstrates real-time audio streaming from a browser to a gRPC server for transcription.

Recorded Demo: Watch on YouTube

Architecture

The system bridges browser audio (WebSocket) to a backend gRPC stream.

graph LR
    A[Browser] -- "WebSocket (Raw PCM)" --> B[Client Container - FastAPI]
    B -- "BiDi: AudioChunk" --> C[Server Container - gRPC]
    C -- "BiDi: TranscriptionResult" --> B
    B -- "WebSocket (JSON)" --> A

🛠️ Audio Tools & Troubleshooting

Helpful resources for debugging audio input/output issues on Linux.

Quick Commands

# Check input devices
arecord -l

# Update and install audio utilities
sudo apt update
sudo apt install pavucontrol alsa-utils

# Check PulseAudio/PipeWire status
systemctl --user status pipewire pipewire-pulse wireplumber

📚 Resources & Libraries

Python Audio & Signal Processing

Pydub - Simple and easy high-level interface for manipulating audio.
Librosa - The gold standard for audio and music analysis in Python.
Spotify Pedalboard - High-quality, studio-grade audio effects in Python.

AI Models & Frameworks

OpenAI Whisper - General-purpose speech recognition model.
Coqui TTS - Deep learning toolkit for Text-to-Speech, offering high-performance models.
Kokoro ONNX & HuggingFace Reference - Lightweight and fast TTS model implementations.
Magenta - Research project exploring the role of machine learning in the process of creating art and music.

Music Information Retrieval

Basic Pitch - A lightweight yet robust audio-to-MIDI converter.
Signal MIDI - A capable web-based MIDI editor for visualizing and editing MIDI files.

Learning

The Sound of AI (Valerio Velardo) - Excellent channel for learning AI audio, DSP, and Deep Learning for audio.
Mike Russell - Professional audio production tips and tutorials.
Audacity - Audio editing and analysis
20 - 20,000 Hz Audio Sweep - Test frequency response for web audio, using spectrogram.
Online Tone Generator - Generate reference tones
Musicca Virtual Instruments - Test MIDI/Synthesis

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
client		client
docs		docs
experiments		experiments
protos		protos
server		server
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Audio Lab

🧪 Experiments Overview

🎹 Additive Synthesis

🗣️ Text-to-Speech (TTS)

🎼 Music Analysis (Basic Pitch)

🎤 Real-time Transcription

🎙️ Featured: gRPC Streaming Audio Transcription

Architecture

🛠️ Audio Tools & Troubleshooting

Quick Commands

📚 Resources & Libraries

About

Uh oh!

Releases

Packages

Languages

jordanblakey/ai-audio-lab

Folders and files

Latest commit

History

Repository files navigation

AI Audio Lab

🧪 Experiments Overview

🎹 Additive Synthesis

🗣️ Text-to-Speech (TTS)

🎼 Music Analysis (Basic Pitch)

🎤 Real-time Transcription

🎙️ Featured: gRPC Streaming Audio Transcription

Architecture

🛠️ Audio Tools & Troubleshooting

Quick Commands

📚 Resources & Libraries

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages