Skip to content

Commit 277e1b3

Browse files
committed
Add Speech Recognition with DDP Example
Signed-off-by: zren11 <[email protected]>
1 parent d13a831 commit 277e1b3

File tree

9 files changed

+2386
-0
lines changed

9 files changed

+2386
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
2+
3+
RUN apt-get update && apt-get install -y ffmpeg
4+
COPY requirements.txt .
5+
RUN pip install -r requirements.txt
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
# Speech Recognition with PyTorch and Kubeflow
2+
3+
A complete example demonstrating **PyTorch Distributed Data Parallel (DDP)** training for speech recognition using Google's [Speech Command](https://huggingface.co/datasets/google/speech_commands) dataset. This project showcases both local development and distributed training on Kubernetes using **Kubeflow Trainer**.
4+
5+
## 🎯 Overview
6+
7+
This repository implements a **Transformer-based neural network** for classifying single-word spoken commands (35 classes) from the Speech Commands v0.02 dataset. The main focus is the comprehensive **`example.ipynb`** notebook that walks you through:
8+
9+
- Local training and development
10+
- Container setup with Docker
11+
- Distributed training on Kubernetes using Kubeflow
12+
- Predict with trained model
13+
14+
## 📋 Quick Start
15+
16+
### 1. Local Environment Setup
17+
18+
**Note**: If you encounter torch installation issues, install PyTorch first:
19+
20+
```bash
21+
pip install torch==2.8
22+
pip install -r requirements.txt
23+
```
24+
25+
### 2. Run the Complete Example
26+
27+
File: `example.ipynb`
28+
29+
This notebook contains everything you need, including:
30+
31+
- Data download and preparation
32+
- Local training examples
33+
- Docker container setup
34+
- Kubernetes cluster creation with Kind
35+
- Kubeflow distributed training
36+
- Predict with trained model.
37+
38+
## 🏗️ Architecture
39+
40+
### Model Architecture
41+
42+
- **Input**: Mel spectrograms (128 mel bins, 81 time frames)
43+
- **Model**: Transformer encoder (4 layers, 4 attention heads, 128 d_model)
44+
- **Output**: 35-class classification (Speech Commands)
45+
- **Training**: PyTorch DDP with automatic mixed precision
46+
47+
### Dataset
48+
49+
- **Source**: Google Speech Commands Dataset v0.02
50+
- **Size**: 105,829 audio files (2.3GB)
51+
- **Classes**: 35 words including "yes", "no", digits 0-9, directions, etc.
52+
- **Format**: 1-second WAV files at 16kHz
53+
54+
## 📁 Project Structure
55+
56+
### Core Files
57+
58+
- **`example.ipynb`** - 📓 **Main notebook with complete workflow**
59+
- **`train_model.py`** - 🚂 Standalone training script
60+
- **`predict.py`** - 🔮 Random audio prediction script
61+
- **`prepare-data.py`** - 📥 Dataset download utility
62+
63+
### Infrastructure Files
64+
65+
- **`Dockerfile`** - 🐳 Container setup (PyTorch 2.8.0 + CUDA 12.8)
66+
- **`kind-config.yaml`** - ☸️ Local Kubernetes cluster configuration
67+
- **`kubeflow-runtime-example.yaml`** - 🎛️ Kubeflow runtime definition
68+
- **`requirements.txt`** - 📦 Python dependencies (219 packages)
69+
70+
## 🚀 Usage Examples
71+
72+
### Data Preparation
73+
74+
```bash
75+
python prepare-data.py
76+
```
77+
78+
### Local Training (Single GPU)
79+
80+
```bash
81+
# Run with single GPU
82+
torchrun --nproc-per-node 1 train_model.py
83+
84+
# Run with multiple GPUs
85+
torchrun --nproc-per-node 2 train_model.py
86+
```
87+
88+
### Random Audio Prediction
89+
90+
```bash
91+
python predict.py
92+
```
93+
94+
Sample output:
95+
96+
```
97+
[ 1/10] ✓ File: /data/SpeechCommands/speech_commands_v0.02/left/ae71797c_nohash_0.wav
98+
True: 'left' | Predicted: 'left' | Confidence: 95.23%
99+
100+
[ 2/10] ✗ File: /data/SpeechCommands/speech_commands_v0.02/yes/ab123cd4_nohash_1.wav
101+
True: 'yes' | Predicted: 'no' | Confidence: 78.45%
102+
```
103+
104+
## 🐳 Docker & Kubernetes Setup
105+
106+
### Build Docker Image
107+
108+
```bash
109+
docker build -t speech-recognition-image:0.1 .
110+
```
111+
112+
### Create Local Kubernetes Cluster
113+
114+
```bash
115+
# Create Kind cluster with data volume mounting
116+
kind create cluster --name ml --config kind-config.yaml
117+
118+
# Load Docker image to cluster
119+
kind load docker-image speech-recognition-image:0.1 --name ml
120+
```
121+
122+
### Deploy Kubeflow Runtime
123+
124+
```bash
125+
# Install Kubeflow Trainer operator
126+
export VERSION=v2.0.0
127+
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/manager?ref=${VERSION}"
128+
129+
# Apply custom runtime
130+
kubectl apply -f kubeflow-runtime-example.yaml
131+
```
132+
133+
## 📊 Distributed Training with Kubeflow
134+
135+
The **`example.ipynb`** notebook demonstrates distributed training:
136+
137+
```python
138+
from kubeflow.trainer import CustomTrainer, TrainerClient
139+
140+
client = TrainerClient()
141+
142+
# Start distributed training job
143+
job_name = client.train(
144+
trainer=CustomTrainer(
145+
func=train_model,
146+
num_nodes=2, # Multi-node training
147+
resources_per_node={
148+
"cpu": 5,
149+
"memory": "50Gi",
150+
# "nvidia.com/gpu": 1, # Uncomment for GPU
151+
},
152+
),
153+
runtime=torch_runtime,
154+
)
155+
```
156+
157+
## 🔧 Configuration
158+
159+
### Key Parameters
160+
161+
- **Batch Size**: 256 (64 in debug mode)
162+
- **Learning Rate**: 0.001 with linear scaling for distributed training
163+
- **Epochs**: 30 (10 in debug mode)
164+
- **Data Split**: 95% train, 3% validation, 2% test
165+
- **Debug Mode**: Set `debug = True` in scripts for faster iteration
166+
167+
### Data Paths
168+
169+
- **Dataset**: `/data/SpeechCommands/speech_commands_v0.02/`
170+
- **Experiments**: `/data/speech-recognition/runs/exp-{timestamp}/`
171+
- **Models**: Saved as `.pth` files with best validation accuracy
172+
173+
## 📈 Monitoring
174+
175+
### TensorBoard
176+
177+
```bash
178+
tensorboard --logdir=/data/speech-recognition/runs
179+
```
180+
181+
### Kubernetes Logs
182+
183+
```bash
184+
# Get pods
185+
kubectl get pods
186+
187+
# View training logs
188+
kubectl logs <pod-name> -f
189+
```
190+
191+
## 🛠️ Development Workflow
192+
193+
1. **Start with `example.ipynb`** - Complete guided walkthrough
194+
2. **Local development** - Use `train_model.py` for quick iterations
195+
3. **Test predictions** - Run `predict.py` to validate model performance
196+
4. **Scale up** - Deploy to Kubernetes for distributed training
197+
198+
## 🧪 Tested Environments
199+
200+
### Software Requirements
201+
202+
- **Python**: 3.12
203+
- **PyTorch**: 2.8
204+
- **Operating System**: Linux x86
205+
206+
### Hardware Tested
207+
208+
**Kubernetes Environment:**
209+
210+
- **Kind**: v0.30.0 with Kubernetes Server v1.34.0
211+
- **Local development cluster for testing**
212+
213+
**Production Environments:**
214+
215+
- **AWS**: 2x g4dn.12xlarge instances (4x Tesla T4 GPUs each) wiht Driver Version 570.172.08 CUDA Version 12.8
216+
- **NVIDIA A6000**: Single card with Driver 535.230.02, CUDA 12.2
217+
218+
### Performance Expectations
219+
220+
- **Accuracy**: ~80% on validation set
221+
- **Loss**: <0.6 after training completion
222+
- **Training Time**: Varies by hardware (use `debug=True` for faster testing on CPU)
223+
224+
### Testing & Validation
225+
226+
- Play WAV files in `example.ipynb` for quick audio verification
227+
- Or use `predict.py` to test random audio samples
228+
229+
## 📝 Notes
230+
231+
- **Data Volume**: The setup uses `/data` directory mounted across all containers
232+
- **GPU Support**: Works with both CPU and GPU training (set `debug=True` for CPU-only testing)
233+
- **Reproducibility**: Fixed random seeds (41) for consistent results
234+
- **Production Ready**: Includes model checkpointing, logging, and monitoring
235+
- **Recommended**: Always use `torchrun` for running `train_model.py`
236+
237+
## 🤝 Contributing
238+
239+
This is a complete example project demonstrating PyTorch DDP and Kubeflow integration. Feel free to adapt the patterns for your own speech recognition or distributed training projects.
240+
241+
---
242+
243+
**💡 Tip**: Start with the `example.ipynb` notebook - it contains the complete workflow and explains each step in detail!

0 commit comments

Comments
 (0)