Is HordeWorker capable of splitting workloads and, if so, what's the proper way to set it up? #1544

tncfdp · 2025-05-18T00:53:25Z

tncfdp
May 18, 2025

Hello, I was under the impression that running a horde worker enabled one to offload a model just like offloading some ,but not all, layers to a local GPU. But the more I read, it seems each worker resources are self contained and the only splitting one can do is the prompt/context itself that could be sent to different workers and collated back in the client, please help me understand.

The modes I see on the docs is remote access through the browser, so basically the same as running locally but the address is from a different machine running the server OR running a horde worker after getting a key from the site and the local running instance keeps polling the server for requests.

I do see mention of not using the api key for locally running horde but the farther I was able to get was setting up a --nomodel instance in 1 machine and a --host instance (with a model) on another, then connecting the browser to the 1st one, it says it has no model and asks for connecting to an "ai provider", which I direct to the 2nd machine. This works but, like I said above, it seems to use only the 2nd machine as self contained resources.

Now granted, this sounds a little backward as the machine with no model doesn't seem to have a way to tell the other one about its resources but then again a horde worker only has 2 parameters for that, namely Gen.Length and Max Context. What I'd like to be able to do is having the koboldcpp that is going to act as client have available to it whatever the horde worker serves before it loads the model, so it can decide how many layers it can offload, just like it does with a local GPU. Is that even possible? Or at least have, by trial and error like the docs suggest, send n layers until finding an appropriate n. (over the network though)

Another possibility is using this https://github.com/db0/KoboldAI-Horde-Bridge
Sounds like overkill if one koboldcpp can already connect to another's koboldcpp API but, maybe if I set the 2 machines as workers and set up a local horde server?

Any pointer in the right direction would be greatly appreciated. If this is not currently possible, tell me what the 1st few steps to add that to the code would be and I'll try to help implementing that, sounds like most of the infrastructure is already there.

Answered by LostRuins

May 18, 2025

This leads me believe then that each instance will be self contained as far as resources go, when you say "shares" your PC and GPU to the horde, this includes whatever model you happen to have loaded it with, correct?

yes

Exclude your --hordekey to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge). Exclude the last 2 parameters to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge).

This is sort of deprecated. In the past, people used to use external horde worker scripts to run the horde worker as a separate program. These days, everyone uses the built in KoboldCpp integrated horde worker, so no external scripts…

View full answer

LostRuins · 2025-05-18T01:05:57Z

LostRuins
May 18, 2025
Maintainer

I think you have a misunderstanding of what Horde is. Let's clarify some terminology.

KoboldCpp is primarily a backend. That is, it functions as a server that clients, such as a Web UI like KoboldAI Lite, can connect to.
KoboldCpp comes bundled with KoboldAI Lite. This is a Web GUI frontend that can connect to many different backends, the main one being the above mentioned KoboldCpp server. When KoboldCpp is run in --nomodel mode, it instead allows you to connect to other backends, such as Horde, OpenAI, remote Kobold instances, or many others.
Meanwhile, AI Horde itself is an online service that provides community crowdsourced LLM inference. But in simpler terms, think of it as a matchmaker between the GPU providers (Workers) and users (Clients).
You can connect your KoboldCpp to the Horde as a worker. This "shares" your PC and GPU to the horde, allowing other public users (including yourself) to generate text from it online through many different UIs. One such UI is the KoboldAI Lite GUI, but there are many others too.
If you just want to use KoboldCpp remotely on your own, you don't need horde. Instead, just launch koboldcpp with a model combined with --remotetunnel along with your model file and selected layers, and you'll get a url to access it.

3 replies

tncfdp May 18, 2025
Author

I think I understand, thanks. This leads me believe then that each instance will be self contained as far as resources go, when you say "shares" your PC and GPU to the horde, this includes whatever model you happen to have loaded it with, correct?

Please explain this from the wiki and faq: (respectivelly)
o Exclude your --hordekey to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge).
o Exclude the last 2 parameters to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge).

The wiki seems a bit more up-to-date, but what it means then to run a "standalone Horde worker"? If those 2 parameters [hordeapikey] [hordeworkername] are crucial to the AI Horde to identify a worker, then it seems to be a way to run a local small "horde", no?

Under the install section of the KAI Bridge, it says: "Edit the clientData.py file and add your KAI worker. If it's a local instance, leave it as it is. If it's a remote Kobold AI instance, fill in the URL and port accordingly."

I dont quite understand yet how the bridge works, but what happens if I set both a local and a remote instance as workers and exclude those 2 parameters as per the faq? what if I delete the horde site from clientData.py and try to run a local server?

It might help if I describe my scenario here: 1 machine has quite a bit more RAM than another, it can load reasonable models, but it's a notebook with no PCIe, the other machine is much older but it has a much better GPU, the idea is to offload a few layers there. Is that possible? If it isn't, do you guys have interest in implementing that?

LostRuins May 18, 2025
Maintainer

This leads me believe then that each instance will be self contained as far as resources go, when you say "shares" your PC and GPU to the horde, this includes whatever model you happen to have loaded it with, correct?

yes

Exclude your --hordekey to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge). Exclude the last 2 parameters to continue using your own standalone Horde worker (e.g. Haidra Scribe / KAI Horde Bridge).

This is sort of deprecated. In the past, people used to use external horde worker scripts to run the horde worker as a separate program. These days, everyone uses the built in KoboldCpp integrated horde worker, so no external scripts are needed anymore.

1 machine has quite a bit more RAM than another, it can load reasonable models, but it's a notebook with no PCIe, the other machine is much older but it has a much better GPU, the idea is to offload a few layers there. Is that possible?

No, Horde does not support distributed inference. The model must be loaded in it's entirety on the same machine.

Answer selected by tncfdp

tncfdp May 18, 2025
Author

okay, thanks.

Regarding "distributed inference. "
We could do with whatever it is when it offloads N layers to the gpu :)
A remote worker will send a list of all its backends to the other, which will appear under "remote: vulkan" or "remote: cuda". It has all the data to calculate however many layers it can send as if they were local. All it has to do is wrap the loading / buffer in network buffers and act as if it is local. (here's the commands I'd send to my gpu, send to yours and return me the output, basically)
Just saying.. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is HordeWorker capable of splitting workloads and, if so, what's the proper way to set it up? #1544

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is HordeWorker capable of splitting workloads and, if so, what's the proper way to set it up? #1544

Uh oh!

Uh oh!

tncfdp May 18, 2025

Replies: 1 comment · 3 replies

Uh oh!

LostRuins May 18, 2025 Maintainer

Uh oh!

tncfdp May 18, 2025 Author

Uh oh!

LostRuins May 18, 2025 Maintainer

Uh oh!

tncfdp May 18, 2025 Author

tncfdp
May 18, 2025

Replies: 1 comment 3 replies

LostRuins
May 18, 2025
Maintainer

tncfdp May 18, 2025
Author

LostRuins May 18, 2025
Maintainer

tncfdp May 18, 2025
Author