|
| 1 | +# Mixtral 8x7B Instruct Truss |
| 2 | + |
| 3 | +This is a [Truss](https://truss.baseten.co/) for Mixtral 8x7B Instruct. Mixtral 8x7B Instruct parameter language model released by [Mistral AI](https://mistral.ai/). It is a mixture-of-experts (MoE) model. This README will walk you through how to deploy this Truss on Baseten to get your own instance of it. |
| 4 | + |
| 5 | + |
| 6 | +## Deployment |
| 7 | + |
| 8 | +First, clone this repository: |
| 9 | + |
| 10 | +```sh |
| 11 | +git clone https://github.com/basetenlabs/truss-examples/ |
| 12 | +cd mixtral-8x7b-instruct-vllm |
| 13 | +``` |
| 14 | + |
| 15 | +Before deployment: |
| 16 | + |
| 17 | +1. Make sure you have a [Baseten account](https://app.baseten.co/signup) and [API key](https://app.baseten.co/settings/account/api_keys). |
| 18 | +2. Install the latest version of Truss: `pip install --upgrade truss` |
| 19 | + |
| 20 | +With `mixtral-8x7b-instruct-vllm` as your working directory, you can deploy the model with: |
| 21 | + |
| 22 | +```sh |
| 23 | +truss push --publish |
| 24 | +``` |
| 25 | + |
| 26 | +Paste your Baseten API key if prompted. |
| 27 | + |
| 28 | +For more information, see [Truss documentation](https://truss.baseten.co). |
| 29 | + |
| 30 | +### Hardware notes |
| 31 | + |
| 32 | +You need two A100s to run Mixtral at `fp16`. If you need access to A100s, please [contact us ](mailto:[email protected]). |
| 33 | + |
| 34 | +## Mixtral 8x7B Instruct API documentation |
| 35 | + |
| 36 | +This section provides an overview of the Mixtral 8x7B Instruct API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided prompt. |
| 37 | + |
| 38 | +### API route: `predict` |
| 39 | + |
| 40 | +The `predict` route is the primary method for generating text completions based on a given prompt. It takes several parameters: |
| 41 | + |
| 42 | +- __prompt__: The input text that you want the model to generate a response for. |
| 43 | +- __stream__ (optional, default=False): A boolean determining whether the model should stream a response back. When `True`, the API returns generated text as it becomes available. |
| 44 | + |
| 45 | +## Example usage |
| 46 | + |
| 47 | +```sh |
| 48 | +truss predict -d '{"prompt": "What is the Mistral wind?"}' |
| 49 | +``` |
| 50 | + |
| 51 | +You can also invoke your model via a REST API: |
| 52 | + |
| 53 | +``` |
| 54 | +curl -X POST " https://app.baseten.co/model_versions/YOUR_MODEL_VERSION_ID/predict" \ |
| 55 | + -H "Content-Type: application/json" \ |
| 56 | + -H 'Authorization: Api-Key {YOUR_API_KEY}' \ |
| 57 | + -d '{ |
| 58 | + "prompt": "What is the meaning of life? Answer in substantial detail with multiple examples from famous philosophies, religions, and schools of thought.", |
| 59 | + "stream": true, |
| 60 | + "max_tokens": 4096 |
| 61 | + }' --no-buffer |
| 62 | +``` |
0 commit comments