GitHub - ai-decentralized/BloomBee: Decentralized LLMs fine-tuning and inference with offloading

Run large language models in a heterogeneous decentralized environment with offloading.

The rapid rise of generative AI has boosted demand for large language model (LLM) inference and fine-tuining services. While proprietary models are still favored, advancements in open-source LLMs have made them competitive. However, high costs and limited GPU resources hinder deployment. This work introduces BloomBee, a decentralized offline serving system that leverages idle GPU resources to provide cost-effective access to LLMs.

We rely on global GPU sharing, which includes more consumer-grade GPUs. If your GPU can only manage a small portion of a large language model, like the Llama3.1 (405B) model, you can connect to a network of servers that load different parts of the model. In this network, you can request inference or fine-tuning services.

🚀 Try now in Colab

Installation

From Pypi

pip install bloombee

From Source

git clone https://github.com/ai-decentralized/BloomBee.git  
cd BloomBee

Create and activate an environment (either one):

# Using venv
python3 -m venv bloombee-venv && source bloombee-venv/bin/activate

# OR using conda (recommended)
conda create -n bloombee python=3.10.16 && conda activate bloombee

Then install:

pip install -e .

How to use BloomBee(Try now in Colab)

1. Start the main server

Start the DHT main node:

python -m bloombee.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31340 --identity_path bootstrapp1.id

After running, you will see output similar to:

[INFO] Running a DHT instance. To connect other peers to this one, use:
--initial_peers /ip4/10.0.4.215/tcp/31340/p2p/QmZtZJwF8G2qspQxEVxXfipV4fR7EgpfnkXdbbzaEooaVf

Copy your own full address (including the /p2p/... part). Each DHT node generates a unique Peer ID, so do not copy the example above.

You can provide this address as --initial_peers to connect workers or other backbone servers.

💡 Tip: If you want your swarm to be accessible outside of your local network, ensure you have a public IP address or set up port forwarding correctly.

2. Connect the workers to the main BloomBee server

Set your main server address (replace with your actual output from step 1):

export BBSERVER=/ip4/10.0.4.215/tcp/31340/p2p/QmZtZJwF8G2qspQxEVxXfipV4fR7EgpfnkXdbbzaEooaVf

Activate the BloomBee environment on each worker (you can reuse the environment created in From Source).

Each worker should be started in a separate terminal (or on a separate node) after activating its environment.

Start the first worker to hold 16 blocks (e.g., 16 transformer layers):

python -m bloombee.cli.run_server huggyllama/llama-7b \
  --initial_peers $BBSERVER --num_blocks 16 --identity_path bootstrap_1.id

Start the second worker in another activated terminal:

python -m bloombee.cli.run_server huggyllama/llama-7b \
  --initial_peers $BBSERVER --num_blocks 16 --identity_path bootstrap_2.id

If you encounter network issues (e.g., connection resets), please verify your worker IP configurations in the relevant config files.

Optional: If bitsandbytes causes a CUDA version error:

cd ~
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes && python setup.py install

Ensure your CUDA library path matches your environment.

3. Run inference or finetune jobs

Inference

cd BloombBee/
python benchmarks/benchmark_inference.py --model huggyllama/llama-7b  --initial_peers $BBSERVER --torch_dtype float32 --seq_len 128

Finetune

cd BloomBee/
python benchmarks/benchmark_training.py --model huggyllama/llama-7b  --initial_peers $BBSERVER --torch_dtype float32  --n_steps 20 --batch_size 32 --seq_len 128

Acknowledgements

BloomBee is built upon a few popular libraries:

Hivemind - A PyTorch library for decentralized deep learning across the Internet.
FlexLLMGen - An offloading-based system running on weak GPUs.
Petals - A library for decentralized LLMs fine-tuning and inference without offloading.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
examples		examples
figures		figures
src/bloombee		src/bloombee
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

From Pypi

From Source

How to use BloomBee(Try now in Colab)

1. Start the main server

2. Connect the workers to the main BloomBee server

3. Run inference or finetune jobs

Inference

Finetune

Acknowledgements

About

Uh oh!

Releases

Contributors 12

Uh oh!

Languages

License

ai-decentralized/BloomBee

Folders and files

Latest commit

History

Repository files navigation

Installation

From Pypi

From Source

How to use BloomBee(Try now in Colab)

1. Start the main server

2. Connect the workers to the main BloomBee server

3. Run inference or finetune jobs

Inference

Finetune

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 12

Uh oh!

Languages