-
Notifications
You must be signed in to change notification settings - Fork 566
Open
Description
pytorch-triton 3.4.0+git11ec6354
torch 2.9.0.dev20250723+cu128
torchaudio 2.8.0.dev20250723+cu128
torchvision 0.24.0.dev20250723+cu128
Repro:
CUDA_VISIBLE_DEVICES=1 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "The capital of France is:" --num_samples=1 --compile
Gives the following:
RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace:
File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 68, in decode_one_token
return sample(logits, **sampling_kwargs)
File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 56, in sample
idx_next = multinomial_sample_one_no_sync(probs)
File "/home/rzou/dev/lab/gpt-fast/mixtral-moe/generate.py", line 42, in multinomial_sample_one_no_sync
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int). To prevent overwriting, clone the t
ensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.
Metadata
Metadata
Assignees
Labels
No labels