Run a custom model with Petals

Starting with Petals 1.2.0, you don't have to convert a new model to a special Petals-compatible format and can serve it directly from a Hugging Face hub repository.

Still, Petals supports only a predefined set of model architectures defined in the petals.models package. If you'd like to support a new architecture, you need to copy the src/petals/models/bloom or src/petals/models/llama directory and update all files to work with your new model.

We recommend to do that in the following order:

Edit config.py and __init__.py:

Make sure that the config is correctly loaded from a Hugging Face Hub repo when using AutoDistributedConfig.from_pretrained(...).

Edit block.py:

Make sure that you can run a Petals server with your model's blocks.
Make sure the server returns correct results for forward and backward passes (the outputs are close the ones of a locally hosted block).
You have to pay attention to the dimension order in attention caches (both keys and values), since many implementations use different dimension orders (e.g., see dimension reordering code in src/petals/models/llama/block.py).
Run the server with --throughput eval to test inference code and check that you have no shape errors.

Edit model.py:
- Create distributed model wrappers using code from the 🤗 Transformers implementation.
- Check that you can run a Petals client and get correct results for inference, forward, backward passes and all model types (the outputs are close to a locally hosted model).
- Check that AutoDistributedModel.from_pretrained(...), AutoDistributedModelForCausalLM.from_pretrained(...), and similar functions correctly load the model from Hugging Face Hub.

If you encounter any issues, don't hesitate to ask in the #running-a-server channel of our Discord.

This project is a part of the BigScience research workshop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run a custom model with Petals

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally