-
Notifications
You must be signed in to change notification settings - Fork 582
Launch your own swarm
This tutorial will walk you through the steps of setting up your own private swarm to inference and fine-tune BLOOM. Please make sure you have already installed Petals and is familiar with the "Getting started" tutorial.
Before we begin:
- This tutorial covers BLOOM-176B. It requires ~200GB of combined GPU memory in 8 bit. If you want to try this on a smaller scale, use
bigscience/bloom-7b1model. If you want to host a new model architecture, you'd need to add support for it manually — follow the "Run a custom model" tutorial first. - If something does not work for you, don't hesitate to reach us out in the #running-a-server channel of our Discord.
If you plan to work with unreliable GPU machines (e.g. spot instances), it is a good practice to have a few CPU-only machines that are always online. These bootstrap peers can be used as --initial_peers, to connect new GPU servers to the existing ones. They can also serve as libp2p relays for GPU servers that lack open ports (e.g., because they are behind NAT and/or firewalls).
If you have reliable GPU machines, you can skip this step and use these servers as initial peers, given that you provide --host_maddrs and --identity_path arguments (described below) directly to the Petals servers.
To start a bootstrap peer, run this line in a tmux/screen shell:
python -m petals.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31337 --identity_path bootstrap1.id Once you run it, look at the outputs and find the following line:
Mon 00 01:23:45.678 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/YOUR_ADDRESS_HERE/tcp/31337/p2p/QmTPAIfThisIsMyAddressGoFindYoursnCfj
You can provide this address as --initial_peers to GPU servers or other backbone peers. If there is a risk that this peer goes down, you can launch additional hivemind-dht instances and provide multiple addresses. New peers will be able to join the swarm as long as at least one of their initial peers is alive.
Here's a few tips to help you set up:
-
The
--host_maddrscontains libp2p multi-addresses specifying a network protocol, IP address and port. Learn more about them here.- If you want your swarm to be accessible outside of your local network, ensure that you have a public IP address or set up port forwarding correctly, so that your peer is reachable from the outside.
- If you run your swarm in a local network only, it's fine to don't have a public IP and ports as long as you use local network's IP addresses everywhere.
- You can specify
0.0.0.0as the IP address, so that the script will listen to the IP addresses of your existing network interfaces.
-
The
--identity_pathcontains a peer's private key and defines the "/p2p/..." part of your peer's address (essentially, its public key).- Set
--identity_pathoption to a file to ensure that your peer has the same identity each time you restart it. If the file doesn't exist, the script will generate a new private key and save it to the specified file. - Make sure each peer's identity is unique.
- If you omit this option, Petals will generate a new identity each time a process is started, so you won't able to get a constant multi-address for your bootstrap peer.
- Set
Now, you can run Petals servers as usual with an extra --initial_peers argument pointing to your bootstrap peers. If you have reliable GPU servers and no bootstrap peers, you can instead add --new_swarm argument to the first server, then use its multi-address as --initial_peers for the rest of the servers.
To use the model, you can create it as usual with an extra initial_peers argument:
INITIAL_PEERS = [
"/ip4/10.1.2.3/tcp/31234/p2p/QmcXhze98AcgGQDDYna23s4Jho96n8wkwLJv78vxtFNq44",
"/ip4/10.1.2.4/tcp/31245/p2p/12D3KooWNPaCDFTKMKBkQazoznq2dkdD3jWkXnYCTJH8PFpggNM6",
]
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, initial_peers=INITIAL_PEERS)Next, you can test that inference and fine-tuning work using the code from the "Getting started" and other tutorials.
You can launch your own instances of the health monitor and/or chatbot interfaces following instructions in their repositories:
- Chatbot web app (including an HTTP inference endpoint): repository
- Health monitor: repository
Don't forget to specify your INITIAL_PEERS in their config.py files, so the instances connect to your private swarm instead of the public one.
If you encounter any issues or want to share feedback, please join #running-a-server channel of our Discord.