Skip to content

Commit a226abe

Browse files
committed
add gpu implementation (#2)
* add wav2aug gpu * make freq drop safer * fix pol inv * add no grads * fix speed perturb * add torchaudio * removed cpu implementation * remove cpu tests * add general utils and tests * add linter * clean docstrings * resamp test aud, cpu when no gpu avail * fix dev install * touch up readme, formatting * remove cuda hard requirement on gpu ops * remove cuda requirement from tests
1 parent 9f2875d commit a226abe

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1183
-2306
lines changed

.github/workflows/black.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: black
2+
3+
on:
4+
push:
5+
branches: [ main, master, add-gpu ]
6+
pull_request:
7+
branches: [ main, master ]
8+
9+
jobs:
10+
format-check:
11+
name: Check Python formatting with Black
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@v4
15+
- name: Set up Python
16+
uses: actions/setup-python@v4
17+
with:
18+
python-version: '3.10'
19+
- name: Install dev dependencies
20+
run: |
21+
python -m pip install --upgrade pip
22+
pip install .[dev]
23+
- name: Run Black (check only)
24+
run: |
25+
black --check .

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ htmlcov/
1616
# OS
1717
.DS_Store
1818

19+
test.py

pyproject.toml

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ classifiers = [
2727
"Topic :: Software Development :: Libraries :: Python Modules",
2828
]
2929
dependencies = [
30-
"torch>=2.8.0",
30+
"torch>=2.0.0",
31+
"torchaudio>=2.8.0",
3132
"torchcodec>=0.7.0",
3233
]
3334

@@ -36,6 +37,9 @@ test = [
3637
"pytest>=8",
3738
"pytest-cov",
3839
]
40+
dev = [
41+
"black>=23.1.0,<25",
42+
]
3943

4044
[project.urls]
4145
Homepage = "https://github.com/gfdb/wav2aug"
@@ -49,4 +53,15 @@ include-package-data = true
4953
packages = { find = { where = ["."], include = ["wav2aug*"], exclude = ["tests*"], namespaces = true } }
5054

5155
[tool.setuptools.package-data]
52-
wav2aug = ["py.typed", "**/*.yaml", "assets/**"]
56+
wav2aug = ["py.typed", "**/*.yaml", "assets/**"]
57+
58+
59+
[tool.black]
60+
line-length = 88
61+
target-version = ["py310"]
62+
# Include only python source files
63+
include = "\\.pyi?$"
64+
# Exclude common build, virtualenv and cache directories
65+
exclude = '''
66+
/(\.|build|dist|__pycache__|\.venv|venv|\.eggs|\.git|\.hg|\.mypy_cache|\.pytest_cache)/
67+
'''

readme.md

Lines changed: 81 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,117 @@
1-
# Wav2Aug: Toward Universal Time-Domain Speech Augmentation
1+
# 🎛️ Wav2Aug: Toward Universal Time-Domain Speech Augmentation
22

3-
A minimalistic PyTorch-based audio augmentation library for speech and audio processing. The goal of this library is to provide a general purpose speech augmentation policy that can be used on any task and perform well without having to tune augmentation hyperparameters. Just install, and start augmenting. Applies two random augmentations per call.
3+
A minimalistic PyTorch-based audio augmentation library for speech and audio augmentation. The goal of this library is to provide a general purpose speech augmentation policy that can be used on any task and perform well without having to tune augmentation hyperparameters. Just install, and start augmenting. Applies two random augmentations per call.
44

55
![Diagram](https://raw.githubusercontent.com/gfdb/wav2aug/main/wav2aug.png)
66

7-
## Features
7+
## ⚙️ Features
88

9-
- **Minimal dependencies**: We only rely on PyTorch and torchcodec.
10-
- **9 core augmentations**: amplitude scaling/clipping, noise addition, frequency dropout, polarity inversion, chunk swapping, speed perturbation, time dropout, and babble noise.
11-
- **In-place operations**: All cpu augmentations are done in place.
9+
* **Minimal dependencies**: we only rely on PyTorch, torchcodec, and torchaudio.
10+
* **9 core augmentations**: amplitude scaling/clipping, noise addition, frequency dropout, polarity inversion, chunk swapping, speed perturbation, time dropout, and babble noise.
11+
* **Simplicity**: just install and start augmenting!
12+
* **Randomness**: all stochastic ops use PyTorch RNGs. Set a single seed and be done, e.g. torch.manual_seed(0); torch.cuda.manual_seed_all(0)
1213

13-
## Installation
14+
## 📦 Installation
1415

1516
### pip
17+
1618
```bash
1719
pip install wav2aug
1820
```
1921

2022
### uv
23+
2124
```bash
2225
uv add wav2aug
2326
```
2427

25-
## Quick Start
28+
## 🚀 Quick Start
2629

2730
```python
2831
import torch
29-
from wav2aug import Wav2Aug
32+
from wav2aug.gpu import Wav2Aug
3033

31-
# Initialize augmenter
32-
aug = Wav2Aug(sample_rate=16000)
34+
# Initialize the augmenter once
35+
augmenter = Wav2Aug(sample_rate=16000)
3336

34-
# Process audio (supports [T] mono or [C, T] multi-channel)
35-
waveform = torch.randn(8000) # 0.5s at 16kHz
36-
augmented = aug(waveform)
37+
# in the forward pass
38+
wavs = torch.randn(3, 50000)
39+
lens = torch.ones((wavs.size(0)))
40+
41+
aug_wavs, aug_lens = augmenter(wavs, lens)
3742
```
3843

39-
## Augmentation Types
44+
That's it!
45+
46+
## 🧪 Augmentation Types
4047

41-
- **Amplitude Scaling/Clipping**: Random gain and peak limiting
42-
- **Noise Addition**: Environmental noise with SNR control
43-
- **Frequency Dropout**: Spectral masking with random notch filters
44-
- **Polarity Inversion**: Random phase flip
45-
- **Chunk Swapping**: Temporal segment reordering
46-
- **Speed Perturbation**: Time-scale modification
47-
- **Time Dropout**: Random silence insertion
48-
- **Babble Noise**: Multi-speaker background (auto-enabled with sufficient buffer)
48+
* 🔊 **Amplitude Scaling/Clipping**: Random gain and peak limiting
49+
* 🌫️ **Noise Addition**: Environmental noise with SNR control
50+
* 📶 **Frequency Dropout**: Spectral masking with random notch filters
51+
* 🔄 **Polarity Inversion**: Random phase flip
52+
* 🧩 **Chunk Swapping**: Temporal segment reordering
53+
* ⏱️ **Speed Perturbation**: Time-scale modification
54+
* 🕳️ **Time Dropout**: Random silence insertion
55+
* 👥 **Babble Noise**: Multi-speaker background (auto-enabled with sufficient buffer)
4956

50-
## Development Installation
57+
## 🛠️ Development Installation
58+
59+
### uv
5160

5261
```bash
5362
git clone https://github.com/gfdb/wav2aug
5463
cd wav2aug
55-
uv python pin 3.10 # or greater
64+
65+
# create venv and pin Python
66+
uv venv
67+
source .venv/bin/activate
68+
uv python pin 3.10 # or 3.11/3.12
69+
70+
# runtime only
5671
uv sync
57-
uv sync --extra test # for test deps
72+
73+
# extras
74+
uv sync --extra dev
75+
uv sync --extra test
5876
```
5977

60-
## Tests
78+
### pip
6179

6280
```bash
63-
uv run pytest tests/
81+
git clone https://github.com/gfdb/wav2aug
82+
cd wav2aug
83+
84+
# create venv
85+
python -m venv .venv
86+
source .venv/bin/activate
87+
88+
# runtime only
89+
python -m pip install .
90+
91+
# editable + extras for development
92+
python -m pip install -e '.[dev,test]'
6493
```
94+
95+
## ✅ Tests
96+
97+
### uv
98+
99+
```bash
100+
uv run pytest -q tests/
101+
```
102+
103+
### pip
104+
105+
```bash
106+
pytest -q tests/
107+
```
108+
109+
## 🤝 Contributing
110+
111+
* Issues and PRs are welcome and encouraged!
112+
113+
* Bug reports: please open an issue with a minimal repro (env, torch/torchaudio/torchcodec versions, code snippet, expected vs. actual, traceback).
114+
115+
* Feature requests: please open an issue with use-case and proposed feature.
116+
117+
* PRs: keep them focused. Add tests when behavior changes. Don't forget to run formatters and tests before submitting!

tests/test.mp3

-132 KB
Binary file not shown.

tests/test.wav

348 KB
Binary file not shown.

tests/test_amplitude_clipping.py

Lines changed: 0 additions & 135 deletions
This file was deleted.

0 commit comments

Comments
 (0)