Skip to content

Launch your own swarm

Alexander Borzunov edited this page Jul 17, 2023 · 45 revisions

This tutorial will walk you through the steps of setting up your own private swarm to inference and fine-tune BLOOM. Please make sure you have already installed Petals and is familiar with the "Getting started" tutorial.

Before we begin:

  • This tutorial covers BLOOM-176B. It requires ~200GB of combined GPU memory in 8 bit. If you want to try this on a smaller scale, use bigscience/bloom-7b1 model. If you want to host a new model architecture, you'd need to add support for it manually — follow the "Run a custom model" tutorial first.
  • If something does not work for you, don't hesitate to reach us out in the #running-a-server channel of our Discord.

Step 1: Set up backbone peers

If you plan to work with unreliable GPU machines (e.g. spot instances), it is a good practice to have a few CPU-only machines that are always online. These bootstrap peers can be used as --initial_peers, to connect new GPU servers to the existing ones. They can also serve as libp2p relays for GPU servers that lack open ports (e.g., because they are behind NAT and/or firewalls).

If you have reliable GPU machines, you can skip this step and use these servers as initial peers, given that you provide --host_maddrs and --identity_path arguments (described below) directly to the Petals servers.

To start a bootstrap peer, run this line in a tmux/screen shell:

python -m petals.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31337 --identity_path bootstrap1.id 

Once you run it, look at the outputs and find the following line:

Mon 00 01:23:45.678 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/YOUR_ADDRESS_HERE/tcp/31337/p2p/QmTPAIfThisIsMyAddressGoFindYoursnCfj

You can provide this address as --initial_peers to GPU servers or other backbone peers. If there is a risk that this peer goes down, you can launch additional hivemind-dht instances and provide multiple addresses. New peers will be able to join the swarm as long as at least one of their initial peers is alive.

Here's a few tips to help you set up:

  • The --host_maddrs contains libp2p multi-addresses specifying a network protocol, IP address and port. Learn more about them here.

    • If you want your swarm to be accessible outside of your local network, ensure that you have a public IP address or set up port forwarding correctly, so that your peer is reachable from the outside.
    • If you run your swarm in a local network only, it's fine to don't have a public IP and ports as long as you use local network's IP addresses everywhere.
    • You can specify 0.0.0.0 as the IP address, so that the script will listen to the IP addresses of your existing network interfaces.
  • The --identity_path contains a peer's private key and defines the "/p2p/..." part of your peer's address (essentially, its public key).

    • Set --identity_path option to a file to ensure that your peer has the same identity each time you restart it. If the file doesn't exist, the script will generate a new private key and save it to the specified file.
    • Make sure each peer's identity is unique.
    • If you omit this option, Petals will generate a new identity each time a process is started, so you won't able to get a constant multi-address for your bootstrap peer.

Step 2: Start Petals servers

Now, you can run Petals servers as usual with an extra --initial_peers argument pointing to your bootstrap peers. If you have reliable GPU servers and no bootstrap peers, you can instead add --new_swarm argument to the first server, then use its multi-address as --initial_peers for the rest of the servers.

Step 3: Use the model

To use the model, you can create it as usual with an extra initial_peers argument:

INITIAL_PEERS = [
    "/ip4/10.1.2.3/tcp/31234/p2p/QmcXhze98AcgGQDDYna23s4Jho96n8wkwLJv78vxtFNq44",
    "/ip4/10.1.2.4/tcp/31245/p2p/12D3KooWNPaCDFTKMKBkQazoznq2dkdD3jWkXnYCTJH8PFpggNM6",
]
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, initial_peers=INITIAL_PEERS)

Next, you can test that inference and fine-tuning work using the code from the "Getting started" and other tutorials.

Step 4 (optional): Launch the health monitor and/or the chatbot interface

You can launch your own instances of the health monitor and/or chatbot interfaces following instructions in their repositories:

Don't forget to specify your INITIAL_PEERS in their config.py files, so the instances connect to your private swarm instead of the public one.


If you encounter any issues or want to share feedback, please join #running-a-server channel of our Discord.

Clone this wiki locally