mixtral-moe example fails to torch.compile

```
pytorch-triton           3.4.0+git11ec6354
torch                    2.9.0.dev20250723+cu128
torchaudio               2.8.0.dev20250723+cu128
torchvision              0.24.0.dev20250723+cu128
```

Repro:
```
 CUDA_VISIBLE_DEVICES=1 python generate.py  --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "The capital of France is:"  --num_samples=1 --compile
```
Gives the following:
```
RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace:
 File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 68, in decode_one_token
    return sample(logits, **sampling_kwargs)
  File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 56, in sample
    idx_next = multinomial_sample_one_no_sync(probs)
  File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 42, in multinomial_sample_one_no_sync
    return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int). To prevent overwriting, clone the t
ensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mixtral-moe example fails to torch.compile #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mixtral-moe example fails to torch.compile #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions