torch.randn not supported on the 0.11.0 fork

when running longer instances my vllm server always errors out because torch.randn is not supported. 


```
docker run --rm -i \
  --device=/dev/kfd --device=/dev/dri \
  mixa3607/vllm-gfx906:0.11.0-rocm-6.3.3-tomylin890-abbe414 \
  python3 - << 'PY'
import torch

print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda")
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI50/MI60
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
torch.AcceleratorError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
```

```
docker run --rm -i \
  --device=/dev/kfd --device=/dev/dri \
  mixa3607/vllm-gfx906:0.11.0-rocm-6.3.3  \
  python3 - << 'PY'
import torch

print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda")  # simulate logits
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI50/MI60
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
torch.AcceleratorError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

```


```
docker run --rm -i \
  --device=/dev/kfd --device=/dev/dri \
  nalanzeyu/vllm-gfx906:latest  \
  python3 - << 'PY'
import torch

print("device:", torch.cuda.get_device_name(0))
x = torch.randn(4, 32000, device="cuda")
y, idx = x.sort(dim=-1, descending=False)
print("sort ok, shape:", y.shape)
PY
device: AMD Instinct MI60 / MI50
sort ok, shape: torch.Size([4, 32000])
```

This is used in some top -k -> top-p ste p which causes an error after prolonged runs. It happens for me when using langchain for like 50 requests at once and after 40 are done it errors out.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.randn not supported on the 0.11.0 fork #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

torch.randn not supported on the 0.11.0 fork #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions