|
| 1 | +--- |
| 2 | +meta: |
| 3 | + title: Supported models in Managed Inference |
| 4 | + description: Explore all AI models supported by Managed Inference |
| 5 | +content: |
| 6 | + h1: Supported models in Managed Inference |
| 7 | + paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models. |
| 8 | +tags: support models custom catalog |
| 9 | +dates: |
| 10 | + validation: 2025-04-08 |
| 11 | + posted: 2025-04-08 |
| 12 | +categories: |
| 13 | + - ai-data |
| 14 | +--- |
| 15 | + |
| 16 | +Scaleway Managed Inference allows you to deploy various AI models, either from: |
| 17 | + |
| 18 | + * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) |
| 19 | + * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. |
| 20 | + |
| 21 | +## Scaleway catalog |
| 22 | + |
| 23 | +### Multimodal models (chat + vision) |
| 24 | + |
| 25 | +_More details to be added._ |
| 26 | + |
| 27 | +### Chat models |
| 28 | + |
| 29 | +| Provider | Model identifier | Documentation | License | |
| 30 | +|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------| |
| 31 | +| Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 32 | +| Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | |
| 33 | +| Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | |
| 34 | +| Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | |
| 35 | +| Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | |
| 36 | +| Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | |
| 37 | +| Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) | |
| 38 | +| Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) | |
| 39 | +| Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | |
| 40 | +| Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 41 | +| Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 42 | +| Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 43 | +| Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 44 | +| Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 45 | +| Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | |
| 46 | + |
| 47 | +### Vision models |
| 48 | + |
| 49 | +_More details to be added._ |
| 50 | + |
| 51 | +### Embedding models |
| 52 | + |
| 53 | +| Provider | Model identifier | Documentation | License | |
| 54 | +|----------|------------------|----------------|---------| |
| 55 | +| BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) | |
| 56 | +| Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | |
| 57 | + |
| 58 | + |
| 59 | +## Custom models |
| 60 | + |
| 61 | +<Message type="note"> |
| 62 | + Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). |
| 63 | +</Message> |
| 64 | + |
| 65 | +### Prerequisites |
| 66 | + |
| 67 | +<Message type="tip"> |
| 68 | + We recommend starting with a variation of a supported model from the Scaleway catalog. |
| 69 | + For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). |
| 70 | + If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. |
| 71 | +</Message> |
| 72 | + |
| 73 | +To deploy a custom model via Hugging Face, ensure the following: |
| 74 | + |
| 75 | +#### Access requirements |
| 76 | + |
| 77 | + * You must have access to the model using your Hugging Face credentials. |
| 78 | + * For gated models, request access through your Hugging Face account. |
| 79 | + * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). |
| 80 | + |
| 81 | +#### Required files |
| 82 | + |
| 83 | +Your model repository must include: |
| 84 | + |
| 85 | + * A `config.json` file containig: |
| 86 | + * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values). |
| 87 | + * `max_position_embeddings` |
| 88 | + * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format |
| 89 | + * A chat template included in either: |
| 90 | + * `tokenizer_config.json` as a `chat_template` field, or |
| 91 | + * `chat_template.json` as a `chat_template` field |
| 92 | + |
| 93 | +#### Supported model types |
| 94 | + |
| 95 | +Your model must be one of the following types: |
| 96 | + |
| 97 | + * `chat` |
| 98 | + * `vision` |
| 99 | + * `multimodal` (chat + vision) |
| 100 | + * `embedding` |
| 101 | + |
| 102 | +<Message type="important"> |
| 103 | + **Security Notice**<br /> |
| 104 | + Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**. |
| 105 | +</Message> |
| 106 | + |
| 107 | +## API support |
| 108 | + |
| 109 | +Depending on the model type, specific endpoints and features will be supported. |
| 110 | + |
| 111 | +### Chat models |
| 112 | + |
| 113 | +The Chat API will be exposed for this model under `/v1/chat/completions` endpoint. |
| 114 | +**Structured outputs** or **Function calling** are not yet supported for custom models. |
| 115 | + |
| 116 | +### Vision models |
| 117 | + |
| 118 | +Chat API will be exposed for this model under `/v1/chat/completions` endpoint. |
| 119 | +**Structured outputs** or **Function calling** are not yet supported for custom models. |
| 120 | + |
| 121 | +### Multimodal models |
| 122 | + |
| 123 | +These models will be treated similarly to both Chat and Vision models. |
| 124 | + |
| 125 | +### Embedding models |
| 126 | + |
| 127 | +Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. |
| 128 | + |
| 129 | + |
| 130 | +## Custom model lifecycle |
| 131 | + |
| 132 | +Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. |
| 133 | +In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. |
| 134 | + |
| 135 | +## Licensing |
| 136 | + |
| 137 | +When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. |
| 138 | + |
| 139 | +## Supported model architectures |
| 140 | + |
| 141 | +Custom models must conform to one of the architectures listed below. Click to expand full list. |
| 142 | + |
| 143 | +<Concept> |
| 144 | + ## Supported custom model architectures |
| 145 | + Custom model deployment currently supports the following model architectures: |
| 146 | + * `AquilaModel` |
| 147 | + * `AquilaForCausalLM` |
| 148 | + * `ArcticForCausalLM` |
| 149 | + * `BaiChuanForCausalLM` |
| 150 | + * `BaichuanForCausalLM` |
| 151 | + * `BloomForCausalLM` |
| 152 | + * `CohereForCausalLM` |
| 153 | + * `Cohere2ForCausalLM` |
| 154 | + * `DbrxForCausalLM` |
| 155 | + * `DeciLMForCausalLM` |
| 156 | + * `DeepseekForCausalLM` |
| 157 | + * `DeepseekV2ForCausalLM` |
| 158 | + * `DeepseekV3ForCausalLM` |
| 159 | + * `ExaoneForCausalLM` |
| 160 | + * `FalconForCausalLM` |
| 161 | + * `Fairseq2LlamaForCausalLM` |
| 162 | + * `GemmaForCausalLM` |
| 163 | + * `Gemma2ForCausalLM` |
| 164 | + * `GlmForCausalLM` |
| 165 | + * `GPT2LMHeadModel` |
| 166 | + * `GPTBigCodeForCausalLM` |
| 167 | + * `GPTJForCausalLM` |
| 168 | + * `GPTNeoXForCausalLM` |
| 169 | + * `GraniteForCausalLM` |
| 170 | + * `GraniteMoeForCausalLM` |
| 171 | + * `GritLM` |
| 172 | + * `InternLMForCausalLM` |
| 173 | + * `InternLM2ForCausalLM` |
| 174 | + * `InternLM2VEForCausalLM` |
| 175 | + * `InternLM3ForCausalLM` |
| 176 | + * `JAISLMHeadModel` |
| 177 | + * `JambaForCausalLM` |
| 178 | + * `LlamaForCausalLM` |
| 179 | + * `LLaMAForCausalLM` |
| 180 | + * `MambaForCausalLM` |
| 181 | + * `FalconMambaForCausalLM` |
| 182 | + * `MiniCPMForCausalLM` |
| 183 | + * `MiniCPM3ForCausalLM` |
| 184 | + * `MistralForCausalLM` |
| 185 | + * `MixtralForCausalLM` |
| 186 | + * `QuantMixtralForCausalLM` |
| 187 | + * `MptForCausalLM` |
| 188 | + * `MPTForCausalLM` |
| 189 | + * `NemotronForCausalLM` |
| 190 | + * `OlmoForCausalLM` |
| 191 | + * `Olmo2ForCausalLM` |
| 192 | + * `OlmoeForCausalLM` |
| 193 | + * `OPTForCausalLM` |
| 194 | + * `OrionForCausalLM` |
| 195 | + * `PersimmonForCausalLM` |
| 196 | + * `PhiForCausalLM` |
| 197 | + * `Phi3ForCausalLM` |
| 198 | + * `Phi3SmallForCausalLM` |
| 199 | + * `PhiMoEForCausalLM` |
| 200 | + * `Qwen2ForCausalLM` |
| 201 | + * `Qwen2MoeForCausalLM` |
| 202 | + * `RWForCausalLM` |
| 203 | + * `StableLMEpochForCausalLM` |
| 204 | + * `StableLmForCausalLM` |
| 205 | + * `Starcoder2ForCausalLM` |
| 206 | + * `SolarForCausalLM` |
| 207 | + * `TeleChat2ForCausalLM` |
| 208 | + * `XverseForCausalLM` |
| 209 | + * `BartModel` |
| 210 | + * `BartForConditionalGeneration` |
| 211 | + * `Florence2ForConditionalGeneration` |
| 212 | + * `BertModel` |
| 213 | + * `RobertaModel` |
| 214 | + * `RobertaForMaskedLM` |
| 215 | + * `XLMRobertaModel` |
| 216 | + * `DeciLMForCausalLM` |
| 217 | + * `Gemma2Model` |
| 218 | + * `GlmForCausalLM` |
| 219 | + * `GritLM` |
| 220 | + * `InternLM2ForRewardModel` |
| 221 | + * `JambaForSequenceClassification` |
| 222 | + * `LlamaModel` |
| 223 | + * `MistralModel` |
| 224 | + * `Phi3ForCausalLM` |
| 225 | + * `Qwen2Model` |
| 226 | + * `Qwen2ForCausalLM` |
| 227 | + * `Qwen2ForRewardModel` |
| 228 | + * `Qwen2ForProcessRewardModel` |
| 229 | + * `TeleChat2ForCausalLM` |
| 230 | + * `LlavaNextForConditionalGeneration` |
| 231 | + * `Phi3VForCausalLM` |
| 232 | + * `Qwen2VLForConditionalGeneration` |
| 233 | + * `Qwen2ForSequenceClassification` |
| 234 | + * `BertForSequenceClassification` |
| 235 | + * `RobertaForSequenceClassification` |
| 236 | + * `XLMRobertaForSequenceClassification` |
| 237 | + * `AriaForConditionalGeneration` |
| 238 | + * `Blip2ForConditionalGeneration` |
| 239 | + * `ChameleonForConditionalGeneration` |
| 240 | + * `ChatGLMModel` |
| 241 | + * `ChatGLMForConditionalGeneration` |
| 242 | + * `DeepseekVLV2ForCausalLM` |
| 243 | + * `FuyuForCausalLM` |
| 244 | + * `H2OVLChatModel` |
| 245 | + * `InternVLChatModel` |
| 246 | + * `Idefics3ForConditionalGeneration` |
| 247 | + * `LlavaForConditionalGeneration` |
| 248 | + * `LlavaNextForConditionalGeneration` |
| 249 | + * `LlavaNextVideoForConditionalGeneration` |
| 250 | + * `LlavaOnevisionForConditionalGeneration` |
| 251 | + * `MantisForConditionalGeneration` |
| 252 | + * `MiniCPMO` |
| 253 | + * `MiniCPMV` |
| 254 | + * `MolmoForCausalLM` |
| 255 | + * `NVLM_D` |
| 256 | + * `PaliGemmaForConditionalGeneration` |
| 257 | + * `Phi3VForCausalLM` |
| 258 | + * `PixtralForConditionalGeneration` |
| 259 | + * `QWenLMHeadModel` |
| 260 | + * `Qwen2VLForConditionalGeneration` |
| 261 | + * `Qwen2_5_VLForConditionalGeneration` |
| 262 | + * `Qwen2AudioForConditionalGeneration` |
| 263 | + * `UltravoxModel` |
| 264 | + * `MllamaForConditionalGeneration` |
| 265 | + * `WhisperForConditionalGeneration` |
| 266 | + * `EAGLEModel` |
| 267 | + * `MedusaModel` |
| 268 | + * `MLPSpeculatorPreTrainedModel` |
| 269 | +</Concept> |
0 commit comments