|
2 | 2 |
|
3 | 3 | Scale hosts the following models in the LLM Engine Model Zoo: |
4 | 4 |
|
5 | | -| Model Name | Inference APIs Available | Fine-tuning APIs Available | Inference Frameworks Available | Inference max total tokens (prompt + response) | |
6 | | -| --------------------- | ------------------------ | -------------------------- | ------------------------------ | ------------------------------ | |
7 | | -| `llama-7b` | ✅ | ✅ | deepspeed, text-generation-inference | 2048 | |
8 | | -| `llama-2-7b` | ✅ | ✅ | text-generation-inference, vllm | 4096| |
9 | | -| `llama-2-7b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
10 | | -| `llama-2-13b` | ✅ | | text-generation-inference, vllm | 4096| |
11 | | -| `llama-2-13b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
12 | | -| `llama-2-70b` | ✅ | ✅ | text-generation-inference, vllm | 4096| |
13 | | -| `llama-2-70b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
14 | | -| `falcon-7b` | ✅ | | text-generation-inference, vllm | 2048 | |
15 | | -| `falcon-7b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
16 | | -| `falcon-40b` | ✅ | | text-generation-inference, vllm | 2048 | |
17 | | -| `falcon-40b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
18 | | -| `mpt-7b` | ✅ | | deepspeed, text-generation-inference, vllm | 2048 | |
19 | | -| `mpt-7b-instruct` | ✅ | ✅ | deepspeed, text-generation-inference, vllm | 2048 | |
20 | | -| `flan-t5-xxl` | ✅ | | deepspeed, text-generation-inference | 2048 | |
21 | | -| `mistral-7b` | ✅ | ✅ | vllm | 8000 | |
22 | | -| `mistral-7b-instruct` | ✅ | ✅ | vllm | 8000 | |
23 | | -| `codellama-7b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
24 | | -| `codellama-7b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
25 | | -| `codellama-13b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
26 | | -| `codellama-13b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
27 | | -| `codellama-34b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
28 | | -| `codellama-34b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
29 | | -| `zephyr-7b-alpha` | ✅ | | text-generation-inference, vllm | 32768 | |
30 | | -| `zephyr-7b-beta` | ✅ | | text-generation-inference, vllm | 32768 | |
| 5 | +| Model Name | Inference APIs Available | Fine-tuning APIs Available | Inference Frameworks Available | Inference max total tokens (prompt + response) | |
| 6 | +| ------------------------ | ------------------------ | -------------------------- | ------------------------------------------ | ---------------------------------------------- | |
| 7 | +| `llama-7b` | ✅ | ✅ | deepspeed, text-generation-inference | 2048 | |
| 8 | +| `llama-2-7b` | ✅ | ✅ | text-generation-inference, vllm | 4096 | |
| 9 | +| `llama-2-7b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 10 | +| `llama-2-13b` | ✅ | | text-generation-inference, vllm | 4096 | |
| 11 | +| `llama-2-13b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 12 | +| `llama-2-70b` | ✅ | ✅ | text-generation-inference, vllm | 4096 | |
| 13 | +| `llama-2-70b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 14 | +| `falcon-7b` | ✅ | | text-generation-inference, vllm | 2048 | |
| 15 | +| `falcon-7b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
| 16 | +| `falcon-40b` | ✅ | | text-generation-inference, vllm | 2048 | |
| 17 | +| `falcon-40b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
| 18 | +| `mpt-7b` | ✅ | | deepspeed, text-generation-inference, vllm | 2048 | |
| 19 | +| `mpt-7b-instruct` | ✅ | ✅ | deepspeed, text-generation-inference, vllm | 2048 | |
| 20 | +| `flan-t5-xxl` | ✅ | | deepspeed, text-generation-inference | 2048 | |
| 21 | +| `mistral-7b` | ✅ | ✅ | vllm | 8000 | |
| 22 | +| `mistral-7b-instruct` | ✅ | ✅ | vllm | 8000 | |
| 23 | +| `mixtral-8x7b` | ✅ | | vllm | 32768 | |
| 24 | +| `mixtral-8x7b-instruct` | ✅ | | vllm | 32768 | |
| 25 | +| `codellama-7b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 26 | +| `codellama-7b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 27 | +| `codellama-13b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 28 | +| `codellama-13b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 29 | +| `codellama-34b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 30 | +| `codellama-34b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 31 | +| `zephyr-7b-alpha` | ✅ | | text-generation-inference, vllm | 32768 | |
| 32 | +| `zephyr-7b-beta` | ✅ | | text-generation-inference, vllm | 32768 | |
31 | 33 |
|
32 | 34 | ## Usage |
33 | 35 |
|
|
0 commit comments