Launch your own swarm

This tutorial will walk you through the steps of setting up your own private swarm to inference and fine-tune BLOOM. Please make sure you have already installed Petals and is familiar with the "Getting started" tutorial.

Before we begin:

This tutorial covers BLOOM-176B. It requires ~200GB of combined GPU memory in 8 bit. If you want to try this on a smaller scale, use bigscience/bloom-7b1 model. If you want to host a new model architecture, you'd need to add support for it manually — follow the "Run a custom model" tutorial first.
If something does not work for you, don't hesitate to reach us out in the #running-a-server channel of our Discord.

Step 1: Set up backbone peers

If you plan to work with unreliable GPU machines (e.g. spot instances), it is a good practice to have a few CPU-only machines that are always online. These bootstrap peers can be used as --initial_peers, to connect new GPU servers to the existing ones. They can also serve as libp2p relays for GPU servers that lack open ports (e.g., because they are behind NAT and/or firewalls).

If you have reliable GPU machines, you can skip this step and use these servers as initial peers, given that you provide --host_maddrs and --identity_path arguments (described below) directly to the Petals servers.

To start a bootstrap peer, run this line in a tmux/screen shell:

python -m petals.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31337 --identity_path bootstrap1.id

Once you run it, look at the outputs and find the following line:

Mon 00 01:23:45.678 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/YOUR_ADDRESS_HERE/tcp/31337/p2p/QmTPAIfThisIsMyAddressGoFindYoursnCfj

You can provide this address as --initial_peers to GPU servers or other backbone peers. If there is a risk that this peer goes down, you can launch additional hivemind-dht instances and provide multiple addresses. New peers will be able to join the swarm as long as at least one of their initial peers is alive.

Here's a few tips to help you set up:

The --host_maddrs contains libp2p multi-addresses specifying a network protocol, IP address and port. Learn more about them here.
- If you want your swarm to be accessible outside of your local network, ensure that you have a public IP address or set up port forwarding correctly, so that your peer is reachable from the outside.
- If you run your swarm in a local network only, it's fine to don't have a public IP and ports as long as you use local network's IP addresses everywhere.
- You can specify 0.0.0.0 as the IP address, so that the script will listen to the IP addresses of your existing network interfaces.
The --identity_path contains a peer's private key and defines the "/p2p/..." part of your peer's address (essentially, its public key).
- Set --identity_path option to a file to ensure that your peer has the same identity each time you restart it. If the file doesn't exist, the script will generate a new private key and save it to the specified file.
- Make sure each peer's identity is unique.
- If you omit this option, Petals will generate a new identity each time a process is started, so you won't able to get a constant multi-address for your bootstrap peer.

Step 2: Start Petals servers

Now, you can run Petals servers as usual with an extra --initial_peers argument pointing to your bootstrap peers. If you have reliable GPU servers and no bootstrap peers, you can instead add --new_swarm argument to the first server, then use its multi-address as --initial_peers for the rest of the servers.

Step 3: Use the model

To use the model, you can create it as usual with an extra initial_peers argument:

INITIAL_PEERS = [
    "/ip4/10.1.2.3/tcp/31234/p2p/QmcXhze98AcgGQDDYna23s4Jho96n8wkwLJv78vxtFNq44",
    "/ip4/10.1.2.4/tcp/31245/p2p/12D3KooWNPaCDFTKMKBkQazoznq2dkdD3jWkXnYCTJH8PFpggNM6",
]
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, initial_peers=INITIAL_PEERS)

Next, you can test that inference and fine-tuning work using the code from the "Getting started" and other tutorials.

Step 4 (optional): Launch the health monitor and/or the chatbot interface

You can launch your own instances of the health monitor and/or chatbot interfaces following instructions in their repositories:

Chatbot web app (including an HTTP inference endpoint): repository
Health monitor: repository

Don't forget to specify your INITIAL_PEERS in their config.py files, so the instances connect to your private swarm instead of the public one.

If you encounter any issues or want to share feedback, please join #running-a-server channel of our Discord.

This project is a part of the BigScience research workshop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Launch your own swarm

Step 1: Set up backbone peers

Step 2: Start Petals servers

Step 3: Use the model

Step 4 (optional): Launch the health monitor and/or the chatbot interface

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally