-
Notifications
You must be signed in to change notification settings - Fork 582
Run a custom model with Petals
Alexander Borzunov edited this page Jul 17, 2023
·
18 revisions
Starting with Petals 1.2.0, you don't have to convert a new model to a special Petals-compatible format and can serve it directly from a Hugging Face hub repository.
Still, Petals supports only a predefined set of model architectures defined in the petals.models package. If you'd like to support a new architecture, you need to copy the src/petals/models/bloom or src/petals/models/llama directory and update all files to work with your new model.
We recommend to do that in the following order:
- Edit
config.pyand__init__.py, make sure that the config is correctly loaded from a Hugging Face Hub repo. - Edit
server.py, make sure that you can run a Petals server with your model's blocks and it returns correct results for forward and backward passes (compared to a locally hosted block). You have to pay attention to the dimension order in attention caches (both keys and values), since many implementations use different dimension orders (e.g., see dimension reordering code insrc/petals/models/llama/block.py). - Edit
client.py, copy the code of model wrappers (e.g., from the 🤗 Transformers implementation) and check that you can run a Petals client and gives correct results for inference, forward and backward passes.
If you encounter any issues, don't hesitate to ask in the #running-a-server channel of our Discord.