Add ROCM GPU support #151

TibixDev · 2025-01-17T06:27:35Z

TibixDev
Jan 17, 2025

Hey, it'd be really nice if we could run this model API on AMD cards, and it would provide a very nice speedup over CPU. Do you think you could add this when you have some free time? It shouldn't be too hard since afaik ONNX models can run with ROCM just fine.

remsky · 2025-01-17T20:48:09Z

remsky
Jan 17, 2025
Maintainer

We're refactoring a few things to more easily support different architectures, though I haven't worked with AMD cards yet. Will try adding them to the onnx providers but may need some community testing to get right

0 replies

Reed-Schimmel · 2025-01-18T02:42:23Z

Reed-Schimmel
Jan 18, 2025

Would ONNX work on the intel Arc GPUs?

0 replies

bgs4free · 2025-01-18T12:21:47Z

bgs4free
Jan 18, 2025

I've got it to work with rocm. The problem is that the docker image becomes huge: Around 22G when I use rocm/dev-ubuntu-22.04 as a base image. I might get it down with a slimmer base image, but I don't expect it to come even close to ghcr.io/remsky/kokoro-fastapi-gpu, which is about 5.7G.

https://github.com/bgs4free/Kokoro-FastAPI/tree/add-rocm-support

0 replies

bgs4free · 2025-01-18T13:16:52Z

bgs4free
Jan 18, 2025

Would ONNX work on the intel Arc GPUs?

~~Theoretically, yes, but I had difficulties compiling it on linux.~~
Edit: misread your question. Thought this was about AMD.

0 replies

gitchat1 · 2025-01-19T13:47:13Z

gitchat1
Jan 19, 2025

I've got it to work with rocm. The problem is that the docker image becomes huge: Around 22G when I use rocm/dev-ubuntu-22.04 as a base image. I might get it down with a slimmer base image, but I don't expect it to come even close to ghcr.io/remsky/kokoro-fastapi-gpu, which is about 5.7G.

https://github.com/bgs4free/Kokoro-FastAPI/tree/add-rocm-support

What version of rocm did you get it working with I have a 5700xt and that thing only supports rocm 5.2 unofficially. Any chance you could get it working on that? Or maybe get direct-ml working with the docker file.

0 replies

bgs4free · 2025-01-19T16:10:34Z

bgs4free
Jan 19, 2025

What version of rocm did you get it working with I have a 5700xt and that thing only supports rocm 5.2 unofficially. Any chance you could get it working on that? Or maybe get direct-ml working with the docker file.

The Dockerfile is based on 6.3.1. Works on 7900XTX.

https://github.com/bgs4free/Kokoro-FastAPI/blob/b064066b9ef80ee09afefb8ff53e2ffbee1387de/docker/rocm/Dockerfile#L1

I'm using the pytorch index for 6.2, because I had issues with the 6.3 version (version mismatch seems to be no issue AFAIK).

https://github.com/bgs4free/Kokoro-FastAPI/blob/b064066b9ef80ee09afefb8ff53e2ffbee1387de/docker/rocm/pyproject.toml#L23

Change it to 5.2, try and see if it works. Make sure you have enough disk space.

# in docker/rocm
docker compose up --build

0 replies

gitchat1 · 2025-01-20T19:31:22Z

gitchat1
Jan 20, 2025

Okay I tried that and it gave me the following error
Error response from daemon: error gathering device information while adding cust
om device "/dev/kfd": no such file or directory

Acording to this thread https://stackoverflow.com/questions/73229163/amd-rocm-with-pytorch-on-navi10-rx-5700-rx-5700-xt you have to Set the environment variable.HSA_OVERRIDE_GFX_VERSION=10.3.0 I just added the variable to the docker file under env. Any idea what else i could try?

0 replies

bgs4free · 2025-01-20T20:30:08Z

bgs4free
Jan 20, 2025

device "/dev/kfd": no such file or directory

devices get passed through from the host. See: https://github.com/bgs4free/Kokoro-FastAPI/blob/add-rocm-support/docker/rocm/docker-compose.yml#L20

What is your host system? This was tested on Linux and I can't speak for any other OS.

0 replies

gitchat1 · 2025-01-20T20:33:23Z

gitchat1
Jan 20, 2025

Oh, okay its windows. I thought that wouldn't reaööy matter since the docker runs an Ubuntu image.

0 replies

bgs4free · 2025-01-20T20:41:00Z

bgs4free
Jan 20, 2025

Oh, okay its windows. I thought that wouldn't reaööy matter since the docker runs an Ubuntu image.

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html

Don't see anything about docker for windows here.

0 replies

gitchat1 · 2025-01-21T10:47:12Z

gitchat1
Jan 21, 2025

I checked my docker config it is running through wsl so that is technically Ubuntu so it should work.

0 replies

remsky · 2025-01-24T03:18:22Z

remsky
Jan 24, 2025
Maintainer

Taking a shot at this shortly, will post links if anyone is able to test it out

(I don't have an AMD card unfortunately, so can just skeleton it up)

0 replies

bgs4free · 2025-01-24T17:20:31Z

bgs4free
Jan 24, 2025

Taking a shot at this shortly, will post links if anyone is able to test it out

(I don't have an AMD card unfortunately, so can just skeleton it up)

My fork works on my linux machine. Didn't make a PR, though, because I personally didn't find it convincing enough to pursue. Please feel free to use my stuff if you see any value.

0 replies

TheMrCodes · 2025-01-31T21:02:03Z

TheMrCodes
Jan 31, 2025

Would ONNX work on the intel Arc GPUs?
Theoretically yes (with the oneDNN executor it should work OpenVino not because of unsuported Ops)

But for Arc I already implemented a version with ipex (intel extention for pytorch) support that uses nearly the exact code as CUDA GPUs
https://github.com/TheMrCodes/Kokoro-FastAPI

For more on Arc please use Issue#106

0 replies

DavidSchmittTech · 2025-02-08T03:48:41Z

DavidSchmittTech
Feb 8, 2025

Taking a shot at this shortly, will post links if anyone is able to test it out

(I don't have an AMD card unfortunately, so can just skeleton it up)

I'm interested in setting this up and would be happy to help test things out. I'm running Ubuntu 24.04.1 with an RX 7900 XTX.

Thanks @bgs4free for your fork and rocm docker-compose file! Unfortunately I'm also seeing the issue where neither of the GPU devices /dev/kfd or /dev/dri are found in the container.

The devices show up on the host outside the container, and I can at least get the container running by either removing the device flags or setting the privileged flag, but in both cases the devices aren't visible inside. Inside the container, rocminfo
returns ROCk module is NOT loaded, possibly no GPU devices, while outside the container I get a large amount of GPU information.

I'm not entirely certain I have the right drivers installed, but ollama and alltalk_tts can both run and use my GPU.

0 replies

kprinssu · 2025-03-24T14:08:05Z

kprinssu
Mar 24, 2025

I have built a Docker image containing the latest (Marc 23, 2025) version of master. This built against my fork of kokoro-fastapi, I will be maintaining this fork a bit.

You can run the container via:

services:
  kokoro-tts:
    image: kprinssu/kokoro-fastapi
    devices:
      - /dev/dri
      - /dev/kfd
    security_opt:
      - seccomp:unconfined
    cap_add:
      - SYS_PTRACE
    networks:
      ai:
        aliases:
          - ai-kokoro-tts
    restart: 'always'
    ports:
      - 8880:8880
    environment:
      # Note: ROCm is slow due to an issue with ROCm librariers and/or drivers 
      # see: https://github.com/ROCm/MIOpen/issues/2981
      - USE_GPU=true
      # Change this to your GFX architecture
      - HSA_OVERRIDE_GFX_VERSION=11.0.0

@bgs4free I have updated your base ROCm image to use the upstream rocm/pytorch images, which will get us around the 6.2 whl files.

4 replies

Snuupy Apr 11, 2025

I tried to use this image with my 7840HS/780M and was unable to get it working (despite setting the HSA override environment variable), here are my logs:

kokoro-tts-1  | INFO:     Started server process [1]
kokoro-tts-1  | INFO:     Waiting for application startup.
kokoro-tts-1  | 02:31:35 PM | INFO     | main:57 | Loading TTS model and voice packs...
kokoro-tts-1  | 02:31:35 PM | INFO     | model_manager:38 | Initializing Kokoro V1 on cudakokoro-tts-1  | 02:31:35 PM | DEBUG    | paths:101 | Searching for model in path: /app/api/src/models
kokoro-tts-1  | 02:31:35 PM | INFO     | kokoro_v1:46 | Loading Kokoro model on cuda
kokoro-tts-1  | 02:31:35 PM | INFO     | kokoro_v1:47 | Config path: /app/api/src/models/v1_0/config.json
kokoro-tts-1  | 02:31:35 PM | INFO     | kokoro_v1:48 | Model path: /app/api/src/models/v1_0/kokoro-v1_0.pth
kokoro-tts-1  | WARNING: Defaulting repo_id to hexgrad/Kokoro-82M. Pass repo_id='hexgrad/Kokoro-82M' to suppress this warning.
kokoro-tts-1  | /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/rnn.py:88: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
kokoro-tts-1  |   warnings.warn("dropout option adds dropout after all but last "
kokoro-tts-1  | /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
kokoro-tts-1  |   WeightNorm.apply(module, name, dim)
kokoro-tts-1  | 02:31:36 PM | DEBUG    | paths:153 | Scanning for voices in path: /app/api/src/voices/v1_0
kokoro-tts-1  | 02:31:36 PM | DEBUG    | paths:131 | Searching for voice in path: /app/api/src/voices/v1_0
kokoro-tts-1  | 02:31:36 PM | DEBUG    | model_manager:77 | Using default voice 'af_heart' for warmup
kokoro-tts-1  | 02:31:36 PM | INFO     | kokoro_v1:81 | Creating new pipeline for language code: a
kokoro-tts-1  | WARNING: Defaulting repo_id to hexgrad/Kokoro-82M. Pass repo_id='hexgrad/Kokoro-82M' to suppress this warning.
kokoro-tts-1  | 02:31:37 PM | DEBUG    | kokoro_v1:261 | Generating audio for text with lang_code 'a': 'Warmup text for initialization.'
kokoro-tts-1  | 02:31:37 PM | ERROR    | kokoro_v1:321 | Generation failed: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  | 02:31:37 PM | ERROR    | main:70 | Failed to initialize model: Warmup failed: Failed to get default voice: Generation failed: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  | ERROR:    Traceback (most recent call last):
kokoro-tts-1  |   File "/app/api/src/inference/model_manager.py", line 143, in generate
kokoro-tts-1  |     async for chunk in self._backend.generate(*args, **kwargs):
kokoro-tts-1  |   File "/app/api/src/inference/kokoro_v1.py", line 264, in generate
kokoro-tts-1  |     for result in pipeline(
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/kokoro/pipeline.py", line 383, in __call__
kokoro-tts-1  |     output = KPipeline.infer(model, ps, pack, speed) if model else None
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/kokoro/pipeline.py", line 232, in infer
kokoro-tts-1  |     return model(ps, pack[len(ps)-1], speed, return_output=True)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
kokoro-tts-1  |     return self._call_impl(*args, **kwargs)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
kokoro-tts-1  |     return forward_call(*args, **kwargs)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/kokoro/model.py", line 133, in forward
kokoro-tts-1  |     audio, pred_dur = self.forward_with_tokens(input_ids, ref_s, speed)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
kokoro-tts-1  |     return func(*args, **kwargs)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/kokoro/model.py", line 93, in forward_with_tokens
kokoro-tts-1  |     input_lengths = torch.full(
kokoro-tts-1  | RuntimeError: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  |
kokoro-tts-1  | During handling of the above exception, another exception occurred:
kokoro-tts-1  |
kokoro-tts-1  | Traceback (most recent call last):
kokoro-tts-1  |   File "/app/api/src/inference/model_manager.py", line 78, in initialize_with_warmup
kokoro-tts-1  |     async for _ in self.generate(warmup_text, (voice_name, voice_path)):
kokoro-tts-1  |   File "/app/api/src/inference/model_manager.py", line 146, in generate
kokoro-tts-1  |     raise RuntimeError(f"Generation failed: {e}")
kokoro-tts-1  | RuntimeError: Generation failed: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  |
kokoro-tts-1  | During handling of the above exception, another exception occurred:
kokoro-tts-1  |
kokoro-tts-1  | Traceback (most recent call last):
kokoro-tts-1  |   File "/app/api/src/inference/model_manager.py", line 81, in initialize_with_warmup
kokoro-tts-1  |     raise RuntimeError(f"Failed to get default voice: {e}")
kokoro-tts-1  | RuntimeError: Failed to get default voice: Generation failed: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  |
kokoro-tts-1  | During handling of the above exception, another exception occurred:
kokoro-tts-1  |
kokoro-tts-1  | Traceback (most recent call last):
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
kokoro-tts-1  |     async with self.lifespan_context(app) as maybe_state:
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
kokoro-tts-1  |     return await anext(self.gen)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
kokoro-tts-1  |     async with original_context(app) as maybe_original_state:
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
kokoro-tts-1  |     return await anext(self.gen)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
kokoro-tts-1  |     async with original_context(app) as maybe_original_state:
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
kokoro-tts-1  |     return await anext(self.gen)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
kokoro-tts-1  |     async with original_context(app) as maybe_original_state:
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
kokoro-tts-1  |     return await anext(self.gen)
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
kokoro-tts-1  |     async with original_context(app) as maybe_original_state:
kokoro-tts-1  |   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
kokoro-tts-1  |     return await anext(self.gen)
kokoro-tts-1  |   File "/app/api/src/main.py", line 65, in lifespan
kokoro-tts-1  |     device, model, voicepack_count = await model_manager.initialize_with_warmup(
kokoro-tts-1  |   File "/app/api/src/inference/model_manager.py", line 99, in initialize_with_warmup
kokoro-tts-1  |     raise RuntimeError(f"Warmup failed: {e}")
kokoro-tts-1  | RuntimeError: Warmup failed: Failed to get default voice: Generation failed: HIP error: invalid device function
kokoro-tts-1  | HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
kokoro-tts-1  | For debugging consider passing AMD_SERIALIZE_KERNEL=3
kokoro-tts-1  | Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
kokoro-tts-1  |
kokoro-tts-1  |
kokoro-tts-1  | ERROR:    Application startup failed. Exiting.
kokoro-tts-1 exited with code 0

rocm with LLM inference is working.

kprinssu Apr 11, 2025

I will maintain the fork, but I am not going be providing support. The underlying ROCm libraries are bit of a crap chute and it took a lot of tinkering to even get it to work with my 7900 XT.

You may want to check the HSA_OVERRIDE_GFX_VERSION environment variable. As RDNA 3 iGPUs typically need to have HSA_OVERRIDE_GFX_VERSION=11.0.2 or HSA_OVERRIDE_GFX_VERSION=11.0.3.

Snuupy Apr 12, 2025

I am not going be providing support.

No worries.

RDNA 3 iGPUs typically need to have HSA_OVERRIDE_GFX_VERSION=11.0.2

This is what I set and I had the above error.

kprinssu Apr 14, 2025

Can you try the following environment variable, HSA_OVERRIDE_GFX_VERSION=11.0.0?

YourSandwich · 2025-06-19T05:40:55Z

YourSandwich
Jun 19, 2025

I was able to make it work on my RX 7900xtx on ArchLinux using uv without docker
pyproject.toml:

[project]
name = "kokoro-fastapi"
version = "0.3.0"
description = "FastAPI TTS Service"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
  # core
  "fastapi==0.115.6",
  "uvicorn==0.34.0",
  "click>=8.0.0",
  "pydantic==2.10.4",
  "pydantic-settings==2.7.0",
  "python-dotenv==1.0.1",
  "sqlalchemy==2.0.27",
  # ML/DL base
  "numpy>=1.26.0,<2.0.0",
  "scipy==1.14.1",
  # audio & utils
  "soundfile==0.13.0",
  "regex==2024.11.6",
  "aiofiles==23.2.1",
  "tqdm==4.67.1",
  "requests==2.32.3",
  "munch==4.0.0",
  "tiktoken==0.8.0",
  "loguru==0.7.3",
  "openai>=1.59.6",
  "pydub>=0.25.1",
  "matplotlib>=3.10.0",
  "mutagen>=1.47.0",
  "psutil>=6.1.1",
  "espeakng-loader==0.2.4",
  "kokoro==0.9.2",
  "misaki[en,ja,ko,zh]==0.9.3",
  "spacy==3.8.7",
  "en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl",
  "inflect>=7.5.0",
  "phonemizer-fork>=3.3.2",
  "av>=14.2.0",
  "text2num>=2.5.1",
]

[project.optional-dependencies]
cpu = ["torch==2.6.0"]
gpu = ["torch==2.6.0+cu124"]
rocm = ["torch==2.7.1+rocm6.3", "pytorch-triton-rocm==3.3.1"]
test = [
  "pytest==8.3.5",
  "pytest-cov==6.0.0",
  "httpx==0.26.0",
  "pytest-asyncio==0.25.3",
  "tomli>=2.0.1",
  "jinja2>=3.1.6",
]

[tool.uv]
conflicts = [
  [
    { extra = "cpu" },
    { extra = "gpu" },
  ],
  [
    { extra = "cpu" },
    { extra = "rocm" },
  ],
  [
    { extra = "gpu" },
    { extra = "rocm" },
  ],
]

[tool.uv.sources]
torch = [
  { index = "pytorch-cpu", extra = "cpu" },
  { index = "pytorch-cuda", extra = "gpu" },
  { index = "pytorch-rocm", extra = "rocm" },
]
pytorch-triton-rocm = [{ index = "pytorch-rocm", extra = "rocm" }]

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu124"
explicit = true

[[tool.uv.index]]
name = "pytorch-rocm"
url = "https://download.pytorch.org/whl/rocm6.3"
explicit = true

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
package-dir = { "" = "api/src" }
packages.find = { where = ["api/src"], namespaces = true }

[tool.pytest.ini_options]
testpaths = ["api/tests", "ui/tests"]
python_files = ["test_*.py"]
addopts = "--cov=api --cov=ui --cov-report=term-missing --cov-config=.coveragerc --full-trace"
asyncio_mode = "auto"

6 replies

YourSandwich Aug 26, 2025

About 3x my Ryzen 9 7950x performance

lufixSch Aug 26, 2025

Very strange. Every time I generate audio I get a bunch of errors similar to the one below. I suspect this might be related. Are you getting those to?

MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 11038720, provided ptr: 0 size: 0
MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 11038720, provided ptr: 0 size: 0

Edit: Probably related to ROCm/MIOpen#2981

kprinssu Aug 26, 2025

You will need to do some manual MIOpen DB tuning and I run my containers via:

Environment=PYTORCH_TUNABLEOP_HIPBLASLT_ENABLED=0
Environment=TORCH_BLAS_PREFER_HIPBLASLT=0
Environment=TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1

# Tranining Env Vars
#Environment=PYTORCH_TUNABLEOP_VERBOSE=1
#Environment=PYTORCH_TUNABLEOP_TUNING=1
#Environment=MIOPEN_FIND_ENFORCE=3
#Environment=MIOPEN_FIND_MODE=1
#Environment=PYTORCH_TUNABLEOP_FILENAME=/home/appuser/.cache/pytorch-tunable/tunableop-config.csv


# After training run with these


Environment=PYTORCH_TUNABLEOP_TUNING=0
Environment=MIOPEN_FIND_ENFORCE=1
Environment=MIOPEN_FIND_MODE=2

I laso have some pre-packaged Docker images from https://github.com/kprinssu/Kokoro-FastAPI

lufixSch Aug 29, 2025

@kprinssu Thank you for your help. I finally got it working with a decent speed improvement thanks to your images.

However, I had to modify your image to also install rocrand-dev otherwise I would get errors.

kprinssu Sep 1, 2025

No worries, I thought the latest builds would have the package but it seems not.

I pushed out a new build 0.2.4-4 which should contain the appropriate OS package. Let me know if it does not work.

Uh oh!

Add ROCM GPU support #151

Uh oh!

Replies: 17 comments · 10 replies

Uh oh!

remsky Jan 17, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

remsky Jan 24, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 17 comments 10 replies

remsky
Jan 17, 2025
Maintainer

remsky
Jan 24, 2025
Maintainer