|
1 | 1 | --- |
2 | 2 | meta: |
3 | | - title: Supported Models in Managed Inference |
4 | | - description: Supported Models in Managed Inference |
| 3 | + title: Supported models in Managed Inference |
| 4 | + description: Explore all AI models supported by Managed Inference |
5 | 5 | content: |
6 | | - h1: Supported Models in Managed Inference |
7 | | - paragraph: Supported Models in Managed Inference |
8 | | -tags: |
| 6 | + h1: Supported models in Managed Inference |
| 7 | + paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models. |
| 8 | +tags: support models custom catalog |
9 | 9 | dates: |
10 | 10 | validation: 2025-04-08 |
11 | 11 | posted: 2025-04-08 |
12 | 12 | categories: |
13 | 13 | - ai-data |
14 | 14 | --- |
15 | 15 |
|
16 | | -## Models supported on Managed Inference |
| 16 | +Scaleway Managed Inference allows you to deploy various AI models, either from: |
17 | 17 |
|
18 | | -Managed Inference supports multiple AI models either from: |
19 | | -- [Scaleway catalog]((#scaleway-catalog)): A curated model list available in [Scaleway Console](https://console.scaleway.com/inference/deployments/) or through [Managed Inference Models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) |
20 | | -- [Custom models](#custom-models): Models imported by you as a user from sources such as HuggingFace. |
| 18 | +- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) |
| 19 | +- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. |
21 | 20 |
|
22 | | -## Scaleway Catalog |
| 21 | +## Scaleway catalog |
23 | 22 |
|
24 | | -### Multimodal models (Chat and Vision) |
| 23 | +### Multimodal models (chat + vision) |
25 | 24 |
|
26 | 25 | ### Chat models |
27 | 26 |
|
28 | | -| Provider | Model string | Documentation | License | |
29 | | -|-----------------|-----------------|-----------------|-----------------| |
30 | | -| Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) | |
31 | | -| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) | |
| 27 | +| Provider | Model identifier | Documentation | License | |
| 28 | +|----------|------------------|----------------|---------| |
| 29 | +| Meta | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) | |
| 30 | +| Meta | `llama-3.1-8b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/) | |
32 | 31 |
|
33 | 32 | ### Vision models |
34 | 33 |
|
| 34 | +_More details to be added._ |
| 35 | + |
35 | 36 | ### Embedding models |
36 | 37 |
|
37 | | -## Custom models |
| 38 | +_More details to be added._ |
| 39 | + |
| 40 | + |
| 41 | +## Custom Models |
38 | 42 |
|
39 | 43 | <Message type="note"> |
40 | | - Custom models are still in Beta status. If you identify unsupported models, you can report the issue to us through our [Slack Community Channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or our [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). |
| 44 | + Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). |
41 | 45 | </Message> |
42 | 46 |
|
43 | | -### Prerequesites |
| 47 | +### Prerequisites |
44 | 48 |
|
45 | 49 | <Message type="tip"> |
46 | | - To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment. |
| 50 | + We recommend starting with a variation of a supported model from the Scaleway catalog. |
| 51 | + For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). |
| 52 | + If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. |
47 | 53 | </Message> |
48 | 54 |
|
49 | | -To deploy a model by providing its URL on Hugging Face, you need to: |
50 | | -- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose. |
| 55 | +To deploy a custom model via Hugging Face, ensure the following: |
| 56 | + |
| 57 | +#### Access requirements |
| 58 | + |
| 59 | +- You must have access to the model using your Hugging Face credentials. |
| 60 | +- For gated models, request access through your Hugging Face account. |
| 61 | +- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). |
| 62 | + |
| 63 | +#### Required files |
| 64 | + |
| 65 | +Your model repository must include: |
51 | 66 |
|
52 | | -The model files need to include: |
53 | | -- a `config.json` file containing: |
54 | | - - `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values. |
| 67 | +- `config.json` with: |
| 68 | + - An `architectures` array (see [supported architectures](#supported-models-architecture)) |
55 | 69 | - `max_position_embeddings` |
56 | | -- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format |
57 | | -- a chat template either in: |
58 | | - - `tokenizer_config.json` file as `chat_template` field |
59 | | - - `chat_template.json` file as `chat_template` field |
| 70 | +- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format |
| 71 | +- A chat template included in either: |
| 72 | + - `tokenizer_config.json` as a `chat_template` field, or |
| 73 | + - `chat_template.json` as a `chat_template` field |
| 74 | + |
| 75 | +#### Supported model types |
| 76 | + |
| 77 | +Your model must be one of the following types: |
60 | 78 |
|
61 | | -The model type need to either be: |
62 | 79 | - `chat` |
63 | 80 | - `vision` |
64 | | -- `multimodal` (`chat` and `vision` currently) |
| 81 | +- `multimodal` (chat + vision) |
65 | 82 | - `embedding` |
66 | 83 |
|
67 | | -For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported. |
| 84 | +<Message type="important"> |
| 85 | + **Security Notice**<br /> |
| 86 | + Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**. |
| 87 | +</Message> |
68 | 88 |
|
69 | | -### Supported API |
| 89 | +## API support |
70 | 90 |
|
71 | | -Depending on the model type, specific endpoints and features will be supported. |
| 91 | +Depending on your model type, the following endpoints will be available: |
72 | 92 |
|
73 | | -#### Chat models |
| 93 | +### Chat models |
74 | 94 |
|
75 | 95 | Chat API will be expposed for this model under `/v1/chat/completions` endpoint. |
76 | 96 | **Structured outputs** or **Function calling** are not yet supported for custom models. |
77 | 97 |
|
78 | | -#### Vision models |
| 98 | +### Vision models |
79 | 99 |
|
80 | 100 | Chat API will be expposed for this model under `/v1/chat/completions` endpoint. |
81 | 101 | **Structured outputs** or **Function calling** are not yet supported for custom models. |
82 | 102 |
|
83 | | -#### Multimodal models (vision and chat) |
| 103 | +### Multimodal models |
84 | 104 |
|
85 | 105 | These models will be treated similarly to both Chat and Vision models. |
86 | 106 |
|
87 | | -#### Embedding models |
| 107 | +### Embedding models |
88 | 108 |
|
89 | 109 | Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. |
90 | 110 |
|
91 | 111 |
|
92 | | -### Custom model lifecycle |
| 112 | +## Custom model lifecycle |
93 | 113 |
|
94 | 114 | Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. |
95 | | -In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand. |
96 | | - |
97 | | -### License |
| 115 | +In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. |
98 | 116 |
|
99 | | -- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. |
| 117 | +## Licensing |
100 | 118 |
|
101 | | -### Supported models architecture |
| 119 | +When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. |
102 | 120 |
|
103 | | -Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` |
| 121 | +## Supported model architectures |
104 | 122 |
|
| 123 | +Custom models must conform to one of the architectures listed below. Click to expand full list. |
105 | 124 |
|
| 125 | +<Concept> |
| 126 | + ## Supported custom model architectures |
| 127 | + Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel` |
| 128 | +</Concept> |
0 commit comments