Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,39 @@ huggingface-cli login
The `triton` package cannot be installed in Windows. Instead use `pip install triton-windows`.
The realtime demo uses VLLM for inference speed. This is currently not supported for windows but you can try with https://github.com/SystemPanic/vllm-windows until support is added.

### AMD ROCm Setup

Requires: `docker-buildx`

vllm doesn't have binaries for most ROCm setups so you'll need to build it.

```bash
git clone -b v0.6.6 git@github.com:vllm-project/vllm.git
# edit Dockerfile.rocm:
# 1. change `BASE_IMAGE` to `ARG BASE_IMAGE="rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0"`
# 2. for `pip install ..` replace pytorch versions with `torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 torchvision==0.20.1+rocm6.2`
# 3. end `pip install ..` command with `--extra-index-url https://download.pytorch.org/whl/rocm6.2;;`
DOCKER_BUILDKIT=1 docker build --build-arg BUILD_FA="0" -f Dockerfile.rocm -t vllm-rocm .
```

This requires 100GB and might take up to an hour to build.

Then build an app image that extends from vllm-rocm.

```bash
docker build -f docker/Dockerfile.rocm -t csm-streaming:latest docker/
```

This should build in less than a minute and take 1GB.

Then check `docker/run-csm.sh` to see if it matches your setup, edit if necessary, then run the testscript, it should display audio devices and your AMD cards.

```bash
docker/run-csm.sh /tmp/test-docker-setup.py
```

If that works run `docker/run-csm.sh` without arguments to start the system.

## Quickstart

Generate a sentence with streaming (chunks are processed and output as they're generated):
Expand Down
24 changes: 24 additions & 0 deletions docker/Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM vllm-rocm:v0.6.6

WORKDIR /csm

RUN git clone https://github.com/davidbrowne17/csm-streaming.git /csm
# if building from existing clone instead of above do:
#COPY . .

RUN sed -i -e 's|^vllm.*|#&|' -e 's|index-url.*|index-url=https://download.pytorch.org/whl/rocm6.2|' requirements.txt; \
pip install -r requirements.txt

# keep an eye on https://huggingface.co/docs/bitsandbytes/non_cuda_backends
ARG GH_BITSANDBYTES=https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.45.1.dev0-py3-none-manylinux_2_24_x86_64.whl

RUN pip install --no-deps --force-reinstall "$GH_BITSANDBYTES"; \
pip install --force-reinstall pytorch-triton-rocm==3.1.0 --index-url https://download.pytorch.org/whl/nightly/rocm6.2

RUN apt update; apt install -y libportaudio2

COPY test-docker-setup.py /tmp/

EXPOSE 8000

CMD ["python", "main.py"]
29 changes: 29 additions & 0 deletions docker/run-csm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash
# Syntax: run-csm.sh [/tmp/test-docker-setup.py]

CSM_MODEL_PATH=$(realpath ~/.cache/huggingface/hub/models--sesame--csm-1b/snapshots/*/model.safetensors)
CSM_CONFIG_PATH=$(realpath ~/.cache/huggingface/hub/models--sesame--csm-1b/snapshots/*/config.json)

## if using pulseaudio instead of pipewire, try:
#-v $XDG_RUNTIME_DIR/pulse/native:/tmp/pulse-native
#-e PULSE_SERVER=unix:/tmp/pulse-native
## or remote socket? (requires load-module module-native-protocol-tcp etc)
#-e PULSE_SERVER=tcp:host.docker.internal

docker run --rm -it -p 8000:8000 --name csm \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--device /dev/snd \
--group-add audio \
-v $XDG_RUNTIME_DIR/pipewire-0:/tmp/pipewire \
-e PIPEWIRE_REMOTE=unix:/tmp/pipewire \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-v $HOME/.cache/pip:/root/.cache/pip \
-v $CSM_MODEL_PATH:/csm/finetuned_model/model.safetensors \
-v $CSM_CONFIG_PATH:/csm/finetuned_model/config.json \
-v ./models:/csm/models \
-v ./config:/csm/config \
-v ./companion.db:/csm/companion.db \
csm-streaming "$@"

26 changes: 26 additions & 0 deletions docker/test-docker-setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env python
import sounddevice as sd
import torch
import runpy

sndlst = sd.query_devices()
print("== sound devices\n", sndlst)
if len(sndlst) == 0:
print("did you pass --device=/dev/snd?")

print("== GPUs")
if not torch.cuda.is_available():
print("cuda not available!")
exit(1)

cnt = torch.cuda.device_count()

for i in range(cnt):
print("dev", i, "=", torch.cuda.get_device_name(i))

if cnt == 0:
print("did you pass --device=/dev/dri?")
exit(1)
else:
runpy.run_module("bitsandbytes", run_name="__main__")