Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ You can read about [kudos](https://github.com/Haidra-Org/haidra-assets/blob/main
- [Option 2: Without Git](#option-2-without-git)
- [Linux](#linux)
- [AMD GPUs](#amd-gpus)
- [Intel Arc / XPU](#intel-arc--xpu)
- [DirectML](#directml)
- [Configuration](#configuration)
- [Basic Settings](#basic-settings)
Expand Down Expand Up @@ -88,6 +89,15 @@ AMD support is experimental, and **Linux-only** for now:
- [WSL support](README_advanced.md#advanced-users-amd-rocm-inside-windows-wsl) is highly experimental.
- Join the [AMD discussion on Discord](https://discord.com/channels/781145214752129095/1076124012305993768) if you're interested in trying.

### Intel Arc / XPU

Intel Arc support is available on **Linux** through PyTorch XPU:

- Use `update-runtime-xpu.sh` and `horde-bridge-xpu.sh`.
- Install the Intel GPU driver and Level Zero runtime on the host OS before running the worker.
- If you have multiple Intel GPUs, set `ONEAPI_DEVICE_SELECTOR` before launching the worker.
- Safety checks currently stay on the CPU on XPU, so keep `safety_on_gpu: false`.

### DirectML

**Experimental** Support for DirectML has been added. See [Running on DirectML](README_advanced.md#advanced-users-running-on-directml) for more information and further instructions. You can now follow this guide using `update-runtime-directml.cmd` and `horde-bridge-directml.cmd` where appropriate. Please note that DirectML is several times slower than *ANY* other methods of running the worker.
Expand Down Expand Up @@ -133,6 +143,18 @@ Tailor settings to your GPU, following these pointers:
- max_batch: 4 # Or higher
```

- **Intel Arc A770 (16GB, Linux/XPU)**:

```yaml
- queue_size: 1
- safety_on_gpu: false # XPU keeps the safety stack on CPU for now
- moderate_performance_mode: true
- unload_models_from_vram_often: false
- max_threads: 1
- max_power: 40
- max_batch: 4
```

- **8-10GB VRAM** (e.g. 2080, 3060, 4060, 4060 Ti):

```yaml
Expand Down Expand Up @@ -176,6 +198,7 @@ Tailor settings to your GPU, following these pointers:
1. Install the worker as described in the [Installation](#installation) section.
2. Run `horde-bridge.cmd` (Windows) or `horde-bridge.sh` (Linux).
- **AMD**: Use `horde-bridge-rocm` versions.
- **Intel Arc / XPU**: Use `horde-bridge-xpu.sh`.

### Stopping

Expand Down Expand Up @@ -208,13 +231,20 @@ CUDA_VISIBLE_DEVICES=0 ./horde-bridge.sh -n "Instance 1"
CUDA_VISIBLE_DEVICES=1 ./horde-bridge.sh -n "Instance 2"
```

For Intel XPU, select the visible GPU with `ONEAPI_DEVICE_SELECTOR`:

```bash
ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 ./horde-bridge-xpu.sh -n "Arc A770 #1"
ONEAPI_DEVICE_SELECTOR=level_zero:gpu:1 ./horde-bridge-xpu.sh -n "Arc A770 #2"
```

**Warning**: High RAM (32-64GB+) is needed for multiple workers. `queue_size` and `max_threads` greatly impact RAM per worker.

## Updating

The worker is constantly improving. Follow development and get update notifications in our [Discord](https://discord.gg/3DxrhksKzn).

Script names below assume Windows (`.cmd`) and NVIDIA. For Linux use `.sh`, for AMD use `-rocm` versions.
Script names below assume Windows (`.cmd`) and NVIDIA. For Linux use `.sh`, for AMD use `-rocm` versions, and for Intel Arc use `-xpu` versions.

### Updating the Worker

Expand All @@ -234,8 +264,9 @@ Script names below assume Windows (`.cmd`) and NVIDIA. For Linux use `.sh`, for
> **Warning**: Some antivirus software (e.g. Avast) may interfere with the update. If you get `CRYPT_E_NO_REVOCATION_CHECK` errors, disable antivirus, retry, then re-enable.

4. Run `update-runtime` for your OS to update dependencies.
- Not all updates require this, but run it if unsure
- **Advanced users**: see [README_advanced.md](README_advanced.md) for manual options
- **Intel Arc / XPU**: Use `update-runtime-xpu.sh`
- Not all updates require this, but run it if unsure
- **Advanced users**: see [README_advanced.md](README_advanced.md) for manual options
Comment on lines +268 to +269
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested bullets under “Run update-runtime for your OS…” are mis-indented, so the Markdown list renders inconsistently (the “Not all updates…” and “Advanced users…” bullets don’t align under step 4). Align the indentation so these remain sub-bullets of step 4.

Suggested change
- Not all updates require this, but run it if unsure
- **Advanced users**: see [README_advanced.md](README_advanced.md) for manual options
- Not all updates require this, but run it if unsure
- **Advanced users**: see [README_advanced.md](README_advanced.md) for manual options

Copilot uses AI. Check for mistakes.
5. [Start the worker](#starting) again

## Custom Models
Expand Down Expand Up @@ -293,6 +324,7 @@ Check the [#local-workers Discord channel](https://discord.com/channels/78114521
Common issues and fixes:

- **Download failures**: Check disk space and internet connection.
- **Intel Arc / XPU not detected**: Confirm the Intel GPU driver and Level Zero runtime are installed, then check that `torch.xpu.is_available()` is true inside the worker environment.
- **Job timeouts**:
- Remove large models (Flux, Cascade, SDXL)
- Lower `max_power`
Expand Down
12 changes: 10 additions & 2 deletions README_advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ HSA Agents

### Prerequisites
* Install [git](https://git-scm.com/) in your system.
* Install CUDA/RoCM if you haven't already.
* Install CUDA/RoCM/Intel XPU drivers if you haven't already.
* Install Python 3.10 or 3.11.
* If using the official python installer **and** you do not already regularly already use python, be sure to check the box that says `Add python.exe to PATH` at the first screen.
* We **strongly recommend** you configure at least 8gb (preferably 16gb+) of memory swap space. This recommendation applies to linux too.
Expand All @@ -159,16 +159,24 @@ HSA Agents
- Install the requirements:
- CUDA: `pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu128`
- RoCM: `pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/rocm6.2`
- Intel XPU: `pip install -r requirements.txt --index-url https://download.pytorch.org/whl/xpu --extra-index-url https://pypi.org/simple`
- Intel XPU requires the Intel GPU driver and Level Zero runtime on the host OS.
- If you need to pin a specific Intel GPU, set `ONEAPI_DEVICE_SELECTOR` (for example `level_zero:gpu:0`) before running the commands below.

### Run worker
- Set your config now, copying `bridgeData_template.yaml` to `bridgeData.yaml`, being sure to set an API key and worker name at a minimum
- `python download_models.py` (**critical - must be run first every time**)
- `python run_worker.py` (to start working)
- Intel XPU manual invocation:
- `python download_models.py --xpu`
- `python run_worker.py --xpu`
- Keep `safety_on_gpu: false`, because the current safety stack uses CPU on XPU.

Pressing control-c will stop the worker but will first have the worker complete any jobs in progress before ending. Please try and avoid hard killing it unless you are seeing many major errors. You can force kill by repeatedly pressing control+c or doing a SIGKILL.

### Important note if manually manage your venvs
- You should be running `python -m pip install -r requirements.txt -U https://download.pytorch.org/whl/cu128` every time you `git pull`. (Use `/whl/rocm6.2` instead if applicable)
- You should be running `python -m pip install -r requirements.txt -U --extra-index-url https://download.pytorch.org/whl/cu128` every time you `git pull`.
- Use `--extra-index-url https://download.pytorch.org/whl/rocm6.2` for RoCM or `--index-url https://download.pytorch.org/whl/xpu --extra-index-url https://pypi.org/simple` for Intel XPU.


## Advanced users, running on directml
Expand Down
24 changes: 23 additions & 1 deletion download_models.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse

from horde_worker_regen.download_models import download_all_models
from horde_worker_regen.runtime_backend import HordeRuntimeBackend
from horde_worker_regen.version_meta import do_version_check

if __name__ == "__main__":
Expand All @@ -23,13 +24,34 @@
default=None,
help="Enable directml and specify device to use.",
)
parser.add_argument(
"--xpu",
action="store_true",
default=False,
help="Enable Intel XPU support for Arc and other Intel GPUs.",
)
parser.add_argument(
"--oneapi-device-selector",
type=str,
default=None,
help="Restrict Intel XPU visibility using ONEAPI_DEVICE_SELECTOR, e.g. level_zero:gpu:0.",
)

args = parser.parse_args()

try:
backend = HordeRuntimeBackend(
directml=args.directml,
xpu=args.xpu,
oneapi_device_selector=args.oneapi_device_selector,
)
except ValueError as e:
parser.error(str(e))

do_version_check()

download_all_models(
purge_unused_loras=args.purge_unused_loras,
load_config_from_env_vars=args.load_config_from_env_vars,
directml=args.directml,
backend=backend,
)
9 changes: 9 additions & 0 deletions environment.xpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
name: ldm
channels:
- conda-forge
- defaults
# Minimal environment for Intel XPU. PyTorch and the rest of the stack are installed with pip.
dependencies:
- git
- pip
- python==3.11
50 changes: 50 additions & 0 deletions horde-bridge-xpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash
# Get the directory of the current script
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

# Build the absolute path to the Conda environment
CONDA_ENV_PATH="$SCRIPT_DIR/conda/envs/linux/lib"

# Add the Conda environment to LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CONDA_ENV_PATH:$LD_LIBRARY_PATH"

# List of directories to check
dirs=(
"/usr/lib"
"/usr/local/lib"
"/lib"
"/lib64"
"/usr/lib/x86_64-linux-gnu"
)

# Check each directory
for dir in "${dirs[@]}"; do
if [ -f "$dir/libjemalloc.so.2" ]; then
export LD_PRELOAD="$dir/libjemalloc.so.2"
printf "Using jemalloc from $dir\n"
break
fi
done

# If jemalloc was not found, print a warning
if [ -z "$LD_PRELOAD" ]; then
printf "WARNING: jemalloc not found. You may run into memory issues! We recommend running 'sudo apt install libjemalloc2'\n"
read -n 1 -s -r -p "Press q to quit or any other key to continue: " key
if [ "$key" = "q" ]; then
printf "\n"
exit 1
fi
fi

XPU_ARGS=(--xpu)
if [ -n "${ONEAPI_DEVICE_SELECTOR:-}" ]; then
XPU_ARGS+=("--oneapi-device-selector=${ONEAPI_DEVICE_SELECTOR}")
printf "Using ONEAPI_DEVICE_SELECTOR=%s\n" "$ONEAPI_DEVICE_SELECTOR"
fi

if "$SCRIPT_DIR/runtime-xpu.sh" python -s "$SCRIPT_DIR/download_models.py" "${XPU_ARGS[@]}"; then
echo "Model Download OK. Starting worker..."
"$SCRIPT_DIR/runtime-xpu.sh" python -s "$SCRIPT_DIR/run_worker.py" "${XPU_ARGS[@]}" "$@"
else
echo "download_models.py exited with error code. Aborting"
fi
10 changes: 7 additions & 3 deletions horde_worker_regen/download_models.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
"""Contains the code to download all models specified in the config file. Executable as a standalone script."""

from horde_worker_regen.runtime_backend import HordeRuntimeBackend


def download_all_models(
*,
load_config_from_env_vars: bool = False,
purge_unused_loras: bool = False,
directml: int | None = None,
backend: HordeRuntimeBackend | None = None,
) -> None:
"""Download all models specified in the config file."""
from horde_worker_regen.load_env_vars import load_env_vars_from_config

backend = backend or HordeRuntimeBackend()
backend.apply_environment()

if not load_config_from_env_vars:
load_env_vars_from_config()

Expand Down Expand Up @@ -57,8 +62,7 @@ def download_all_models(
del _

extra_comfyui_args = []
if directml is not None:
extra_comfyui_args.append(f"--directml={directml}")
backend.append_comfyui_args(extra_comfyui_args)

hordelib.initialise(extra_comfyui_args=extra_comfyui_args)
from hordelib.shared_model_manager import SharedModelManager
Expand Down
11 changes: 6 additions & 5 deletions horde_worker_regen/process_management/inference_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
HordeProcessState,
ModelLoadState,
)
from horde_worker_regen.runtime_backend import HordeRuntimeBackend, clear_torch_cache

if TYPE_CHECKING:
from hordelib.horde import HordeLib, ProgressReport, ResultingImageReturn
Expand Down Expand Up @@ -80,6 +81,7 @@ class HordeInferenceProcess(HordeProcess):
_active_model_name: str | None = None
"""The name of the currently active model. Note that other models may be loaded in RAM or VRAM."""
_aux_model_lock: Lock
_backend: HordeRuntimeBackend

def __init__(
self,
Expand All @@ -93,6 +95,7 @@ def __init__(
process_launch_identifier: int,
*,
high_memory_mode: bool = False,
backend: HordeRuntimeBackend | None = None,
) -> None:
"""Initialise the HordeInferenceProcess.

Expand All @@ -119,6 +122,7 @@ def __init__(
)

self._aux_model_lock = aux_model_lock
self._backend = backend or HordeRuntimeBackend()

# We import these here to guard against potentially importing them in the main process
# which would create shared objects, potentially causing issues
Expand Down Expand Up @@ -548,13 +552,10 @@ def start_inference(self, job_info: ImageGenerateJobPopResponse) -> list[Resulti
self._vae_decode_semaphore.release()
return results

@staticmethod
def clear_gc_and_torch_cache() -> None:
def clear_gc_and_torch_cache(self) -> None:
"""Clear the garbage collector and the PyTorch cache."""
gc.collect()
from torch.cuda import empty_cache

empty_cache()
clear_torch_cache(self._backend)

@logger.catch(reraise=True)
def unload_models_from_vram(self) -> None:
Expand Down
7 changes: 3 additions & 4 deletions horde_worker_regen/process_management/main_entry_point.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,22 @@

from horde_worker_regen.bridge_data.data_model import reGenBridgeData
from horde_worker_regen.process_management.process_manager import HordeWorkerProcessManager
from horde_worker_regen.runtime_backend import HordeRuntimeBackend


def start_working(
ctx: BaseContext,
bridge_data: reGenBridgeData,
horde_model_reference_manager: ModelReferenceManager,
*,
amd_gpu: bool = False,
directml: int | None = None,
backend: HordeRuntimeBackend | None = None,
) -> None:
"""Create and start process manager."""
process_manager = HordeWorkerProcessManager(
ctx=ctx,
bridge_data=bridge_data,
horde_model_reference_manager=horde_model_reference_manager,
amd_gpu=amd_gpu,
directml=directml,
backend=backend,
)

process_manager.start()
Loading
Loading