-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
when running longer instances my vllm server always errors out because torch.randn is not supported.
docker run --rm -i \
--device=/dev/kfd --device=/dev/dri \
mixa3607/vllm-gfx906:0.11.0-rocm-6.3.3-tomylin890-abbe414 \
python3 - << 'PY'
import torch
print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda")
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI50/MI60
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
torch.AcceleratorError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
docker run --rm -i \
--device=/dev/kfd --device=/dev/dri \
mixa3607/vllm-gfx906:0.11.0-rocm-6.3.3 \
python3 - << 'PY'
import torch
print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda") # simulate logits
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI50/MI60
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
torch.AcceleratorError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
docker run --rm -i \
--device=/dev/kfd --device=/dev/dri \
nalanzeyu/vllm-gfx906:latest \
python3 - << 'PY'
import torch
print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda")
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI60 / MI50
sort ok, shape: torch.Size([4, 32000])
This is used in some top -k -> top-p ste p which causes an error after prolonged runs. It happens for me when using langchain for like 50 requests at once and after 40 are done it errors out.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels