Skip to content

Commit b85b825

Browse files
fix: inpainting noise, composer initial span, readme
1 parent a7091a5 commit b85b825

File tree

3 files changed

+33
-22
lines changed

3 files changed

+33
-22
lines changed

README.md

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ t = torch.tensor([40, 10, 20])
4545
y = unet(x, t) # [3, 1, 32768], 3 audio tracks of ~1.6s sampled at 20050 Hz
4646
```
4747

48-
### Elucidated Diffusion
48+
### Diffusion
4949

5050
```python
5151
from audio_diffusion_pytorch.diffusion.elucidated import Diffusion, DiffusionSampler, LogNormalSampler, KerrasSchedule
@@ -72,42 +72,51 @@ sampler = DiffusionSampler(
7272
s_churn=40,
7373
s_noise=1.003
7474
)
75+
# Generate a sample starting from the provided noise
7576
y = sampler(x = torch.randn(1,1,2 ** 15))
7677

7778
```
7879

7980

80-
### Gaussian Diffusion (Old)
81-
Note that this requires `use_learned_time_embedding=False` on the `UNet1d`.
81+
### Diffusion Inpainting and Infinite Generation
82+
8283
```py
83-
from audio_diffusion_pytorch.diffusion.ddpm import Diffusion, DiffusionSampler
84-
# Build diffusion to train denoise function
85-
diffusion = Diffusion(
86-
denoise_fn=unet,
87-
num_timesteps=50,
88-
loss_fn='l1',
89-
loss_weight_gamma=0.5,
90-
loss_weight_k=1
84+
from audio_diffusion_pytorch.diffusion.elucidated import DiffusionInpainter, KerrasSchedule, SpanBySpanComposer
85+
86+
inpainter = DiffusionInpainter(
87+
diffusion,
88+
num_steps=2,
89+
num_resamples=5,
90+
sigma_schedule=KerrasSchedule(
91+
sigma_min=0.002,
92+
sigma_max=1
93+
),
94+
s_tmin=0,
95+
s_tmax=10,
96+
s_churn=40,
97+
s_noise=1.003
9198
)
9299

93-
x = torch.randn(3, 1, 2 ** 15)
94-
loss = diffusion(x)
95-
loss.backwards() # Do this many times
100+
inpaint = torch.randn(1,1,2 ** 15) # This should not be random but your start track, e.g. one sampled with DiffusionSampler
101+
inpaint_mask = torch.randint(0,2, (1,1,2 ** 15), dtype=torch.bool) # Set to `True` the parts you want to keep
102+
y = inpainter(inpaint = inpaint, inpaint_mask = inpaint_mask) # [1, 1, 32768]
96103

97104

98-
# Sample from diffusion model by converting normal tensor to audio
99-
sampler = DiffusionSampler(diffusion)
100-
y = sampler(x = torch.randn(1, 1, 2 ** 15)) # [1, 1, 32768]
105+
# Infinite generation using SpanBySpanComposer
106+
composer = SpanBySpanComposer(inpainter, num_spans=4) # Generates 4 additional spans
107+
y_long = composer(y, keep_start=True) # [1, 1, 98304]
108+
101109
```
102110

111+
103112
## Experiments
104113

105114

106115
| Report | Snapshot | Description |
107116
| --- | --- | --- |
108117
| [Alpha](https://wandb.ai/schneider/audio/reports/Audio-Diffusion-UNet-Alpha---VmlldzoyMjk3MzIz?accessToken=y0l3igdvnm4ogn4d3ph3b0i8twwcf7meufbviwt15f0qtasyn1i14hg340bkk1te) | [6bd9279f19](https://github.com/archinetai/audio-diffusion-pytorch/tree/6bd9279f192fc0c11eb8a21cd919d9c41181bf35) | Initial tests on LJSpeech dataset with new architecture and basic DDPM diffusion model. |
109118
| [Bravo](https://wandb.ai/schneider/audio/reports/Audio-Diffusion-Bravo---VmlldzoyMzE4NjIx?accessToken=qt2w1jeqch9l5v3ffjns99p69jsmexk849dszyiennfbivgg396378u6ken2fm2d) | [a05f30aa94](https://github.com/archinetai/audio-diffusion-pytorch/tree/a05f30aa94e07600038d36cfb96f8492ef735a99) | Elucidated diffusion, improved architecture with patching, longer duration, initial good (unsupervised) results on LJSpeech.
110-
| Charlie | (current) | . |
119+
| [Charlie](https://wandb.ai/schneider/audio/reports/Audio-Diffusion-Charlie---VmlldzoyMzYyNDA1?accessToken=71gmurcwndv5e2abqrjnlh3n74j5555j3tycpd7h40tnv8fvb17k5pjkb57j9xxa) | (current) | Train on music with YoutubeDataset, larger patch tests for longer tracks, inpainting tests, initial test with infinite generation using StepByStepComposer. |
111120

112121

113122
## Appreciation

audio_diffusion_pytorch/diffusion/elucidated.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ def step(
252252
epsilon = self.s_noise * torch.randn_like(x)
253253
noise = sqrt(sigma_hat ** 2 - sigma ** 2) * epsilon
254254
# Add increased noise to mixed value
255-
x_hat = (x * ~inpaint_mask + inpaint * inpaint_mask) * noise
255+
x_hat = x * ~inpaint_mask + inpaint * inpaint_mask + noise
256256
# Evaluate ∂x/∂sigma at sigma_hat
257257
d = (x_hat - self.denoise_fn(x_hat, sigma=sigma_hat, clamp=clamp)) / sigma_hat
258258
# Take euler step from sigma_hat to sigma_next
@@ -321,8 +321,10 @@ def __init__(
321321
def forward(self, start: Tensor, keep_start: bool = False) -> Tensor:
322322
half_length = start.shape[2] // 2
323323

324-
spans = [start[:, :, :half_length]] if keep_start else []
325-
inpaint = start
324+
spans = list(start.chunk(chunks=2, dim=-1)) if keep_start else []
325+
# Inpaint second half from first half
326+
inpaint = torch.zeros_like(start)
327+
inpaint[:, :, :half_length] = start[:, :, half_length:]
326328
inpaint_mask = sequential_mask(like=start, start=half_length)
327329

328330
for i in range(self.num_spans):

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
setup(
44
name="audio-diffusion-pytorch",
55
packages=find_packages(exclude=[]),
6-
version="0.0.6",
6+
version="0.0.7",
77
license="MIT",
88
description="Audio Diffusion - PyTorch",
99
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)