Skip to content

Commit 3edd4f0

Browse files
bene2k1fpagnyRoRoJ
authored
docs(inference): update supported model information (#4817)
* feat(inference): add custom models requirements * feat(inference): add custom model support * feat(inference): update custom models * docs(infr): update docs * feat(minfr): add chat models * fix(gen): small typo * feat(inference): update quickstart * feat(infr): update * Apply suggestions from code review Co-authored-by: Rowena Jones <[email protected]> --------- Co-authored-by: fpagny <[email protected]> Co-authored-by: Rowena Jones <[email protected]>
1 parent de9b23e commit 3edd4f0

File tree

4 files changed

+282
-3
lines changed

4 files changed

+282
-3
lines changed

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -860,6 +860,10 @@
860860
"label": "OpenAI API compatibility",
861861
"slug": "openai-compatibility"
862862
},
863+
{
864+
"label": "Supported models in Managed Inference",
865+
"slug": "supported-models"
866+
},
863867
{
864868
"label": "Support for function calling in Scaleway Managed Inference",
865869
"slug": "function-calling-support"

pages/managed-inference/how-to/create-deployment.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ content:
77
paragraph: This page explains how to deploy a model on Scaleway Managed Inference
88
tags: managed-inference ai-data creating dedicated
99
dates:
10-
validation: 2025-04-01
10+
validation: 2025-04-09
1111
posted: 2024-03-06
1212
---
1313

@@ -19,7 +19,10 @@ dates:
1919
1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
2020
2. Click **Deploy a model** to launch the model deployment wizard.
2121
3. Provide the necessary information:
22-
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
22+
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
23+
<Message type="important">
24+
Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
25+
</Message>
2326
<Message type="note">
2427
Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
2528
</Message>

pages/managed-inference/quickstart.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,10 @@ Here are some of the key features of Scaleway Managed Inference:
3838
1. Navigate to the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
3939
2. Click **Create deployment** to launch the deployment creation wizard.
4040
3. Provide the necessary information:
41-
- Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
41+
- Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
42+
<Message type="important">
43+
Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
44+
</Message>
4245
<Message type="note">
4346
Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
4447
</Message>
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
---
2+
meta:
3+
title: Supported models in Managed Inference
4+
description: Explore all AI models supported by Managed Inference
5+
content:
6+
h1: Supported models in Managed Inference
7+
paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models.
8+
tags: support models custom catalog
9+
dates:
10+
validation: 2025-04-08
11+
posted: 2025-04-08
12+
categories:
13+
- ai-data
14+
---
15+
16+
Scaleway Managed Inference allows you to deploy various AI models, either from:
17+
18+
* [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
19+
* [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
20+
21+
## Scaleway catalog
22+
23+
### Multimodal models (chat + vision)
24+
25+
_More details to be added._
26+
27+
### Chat models
28+
29+
| Provider | Model identifier | Documentation | License |
30+
|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------|
31+
| Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
32+
| Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
33+
| Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
34+
| Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) |
35+
| Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) |
36+
| Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
37+
| Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) |
38+
| Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) |
39+
| Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
40+
| Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
41+
| Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
42+
| Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
43+
| Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
44+
| Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
45+
| Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) |
46+
47+
### Vision models
48+
49+
_More details to be added._
50+
51+
### Embedding models
52+
53+
| Provider | Model identifier | Documentation | License |
54+
|----------|------------------|----------------|---------|
55+
| BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
56+
| Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
57+
58+
59+
## Custom models
60+
61+
<Message type="note">
62+
Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
63+
</Message>
64+
65+
### Prerequisites
66+
67+
<Message type="tip">
68+
We recommend starting with a variation of a supported model from the Scaleway catalog.
69+
For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit).
70+
If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
71+
</Message>
72+
73+
To deploy a custom model via Hugging Face, ensure the following:
74+
75+
#### Access requirements
76+
77+
* You must have access to the model using your Hugging Face credentials.
78+
* For gated models, request access through your Hugging Face account.
79+
* Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
80+
81+
#### Required files
82+
83+
Your model repository must include:
84+
85+
* A `config.json` file containig:
86+
* An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values).
87+
* `max_position_embeddings`
88+
* Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
89+
* A chat template included in either:
90+
* `tokenizer_config.json` as a `chat_template` field, or
91+
* `chat_template.json` as a `chat_template` field
92+
93+
#### Supported model types
94+
95+
Your model must be one of the following types:
96+
97+
* `chat`
98+
* `vision`
99+
* `multimodal` (chat + vision)
100+
* `embedding`
101+
102+
<Message type="important">
103+
**Security Notice**<br />
104+
Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
105+
</Message>
106+
107+
## API support
108+
109+
Depending on the model type, specific endpoints and features will be supported.
110+
111+
### Chat models
112+
113+
The Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
114+
**Structured outputs** or **Function calling** are not yet supported for custom models.
115+
116+
### Vision models
117+
118+
Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
119+
**Structured outputs** or **Function calling** are not yet supported for custom models.
120+
121+
### Multimodal models
122+
123+
These models will be treated similarly to both Chat and Vision models.
124+
125+
### Embedding models
126+
127+
Embeddings API will be exposed for this model under `/v1/embeddings` endpoint.
128+
129+
130+
## Custom model lifecycle
131+
132+
Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments.
133+
In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**.
134+
135+
## Licensing
136+
137+
When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
138+
139+
## Supported model architectures
140+
141+
Custom models must conform to one of the architectures listed below. Click to expand full list.
142+
143+
<Concept>
144+
## Supported custom model architectures
145+
Custom model deployment currently supports the following model architectures:
146+
* `AquilaModel`
147+
* `AquilaForCausalLM`
148+
* `ArcticForCausalLM`
149+
* `BaiChuanForCausalLM`
150+
* `BaichuanForCausalLM`
151+
* `BloomForCausalLM`
152+
* `CohereForCausalLM`
153+
* `Cohere2ForCausalLM`
154+
* `DbrxForCausalLM`
155+
* `DeciLMForCausalLM`
156+
* `DeepseekForCausalLM`
157+
* `DeepseekV2ForCausalLM`
158+
* `DeepseekV3ForCausalLM`
159+
* `ExaoneForCausalLM`
160+
* `FalconForCausalLM`
161+
* `Fairseq2LlamaForCausalLM`
162+
* `GemmaForCausalLM`
163+
* `Gemma2ForCausalLM`
164+
* `GlmForCausalLM`
165+
* `GPT2LMHeadModel`
166+
* `GPTBigCodeForCausalLM`
167+
* `GPTJForCausalLM`
168+
* `GPTNeoXForCausalLM`
169+
* `GraniteForCausalLM`
170+
* `GraniteMoeForCausalLM`
171+
* `GritLM`
172+
* `InternLMForCausalLM`
173+
* `InternLM2ForCausalLM`
174+
* `InternLM2VEForCausalLM`
175+
* `InternLM3ForCausalLM`
176+
* `JAISLMHeadModel`
177+
* `JambaForCausalLM`
178+
* `LlamaForCausalLM`
179+
* `LLaMAForCausalLM`
180+
* `MambaForCausalLM`
181+
* `FalconMambaForCausalLM`
182+
* `MiniCPMForCausalLM`
183+
* `MiniCPM3ForCausalLM`
184+
* `MistralForCausalLM`
185+
* `MixtralForCausalLM`
186+
* `QuantMixtralForCausalLM`
187+
* `MptForCausalLM`
188+
* `MPTForCausalLM`
189+
* `NemotronForCausalLM`
190+
* `OlmoForCausalLM`
191+
* `Olmo2ForCausalLM`
192+
* `OlmoeForCausalLM`
193+
* `OPTForCausalLM`
194+
* `OrionForCausalLM`
195+
* `PersimmonForCausalLM`
196+
* `PhiForCausalLM`
197+
* `Phi3ForCausalLM`
198+
* `Phi3SmallForCausalLM`
199+
* `PhiMoEForCausalLM`
200+
* `Qwen2ForCausalLM`
201+
* `Qwen2MoeForCausalLM`
202+
* `RWForCausalLM`
203+
* `StableLMEpochForCausalLM`
204+
* `StableLmForCausalLM`
205+
* `Starcoder2ForCausalLM`
206+
* `SolarForCausalLM`
207+
* `TeleChat2ForCausalLM`
208+
* `XverseForCausalLM`
209+
* `BartModel`
210+
* `BartForConditionalGeneration`
211+
* `Florence2ForConditionalGeneration`
212+
* `BertModel`
213+
* `RobertaModel`
214+
* `RobertaForMaskedLM`
215+
* `XLMRobertaModel`
216+
* `DeciLMForCausalLM`
217+
* `Gemma2Model`
218+
* `GlmForCausalLM`
219+
* `GritLM`
220+
* `InternLM2ForRewardModel`
221+
* `JambaForSequenceClassification`
222+
* `LlamaModel`
223+
* `MistralModel`
224+
* `Phi3ForCausalLM`
225+
* `Qwen2Model`
226+
* `Qwen2ForCausalLM`
227+
* `Qwen2ForRewardModel`
228+
* `Qwen2ForProcessRewardModel`
229+
* `TeleChat2ForCausalLM`
230+
* `LlavaNextForConditionalGeneration`
231+
* `Phi3VForCausalLM`
232+
* `Qwen2VLForConditionalGeneration`
233+
* `Qwen2ForSequenceClassification`
234+
* `BertForSequenceClassification`
235+
* `RobertaForSequenceClassification`
236+
* `XLMRobertaForSequenceClassification`
237+
* `AriaForConditionalGeneration`
238+
* `Blip2ForConditionalGeneration`
239+
* `ChameleonForConditionalGeneration`
240+
* `ChatGLMModel`
241+
* `ChatGLMForConditionalGeneration`
242+
* `DeepseekVLV2ForCausalLM`
243+
* `FuyuForCausalLM`
244+
* `H2OVLChatModel`
245+
* `InternVLChatModel`
246+
* `Idefics3ForConditionalGeneration`
247+
* `LlavaForConditionalGeneration`
248+
* `LlavaNextForConditionalGeneration`
249+
* `LlavaNextVideoForConditionalGeneration`
250+
* `LlavaOnevisionForConditionalGeneration`
251+
* `MantisForConditionalGeneration`
252+
* `MiniCPMO`
253+
* `MiniCPMV`
254+
* `MolmoForCausalLM`
255+
* `NVLM_D`
256+
* `PaliGemmaForConditionalGeneration`
257+
* `Phi3VForCausalLM`
258+
* `PixtralForConditionalGeneration`
259+
* `QWenLMHeadModel`
260+
* `Qwen2VLForConditionalGeneration`
261+
* `Qwen2_5_VLForConditionalGeneration`
262+
* `Qwen2AudioForConditionalGeneration`
263+
* `UltravoxModel`
264+
* `MllamaForConditionalGeneration`
265+
* `WhisperForConditionalGeneration`
266+
* `EAGLEModel`
267+
* `MedusaModel`
268+
* `MLPSpeculatorPreTrainedModel`
269+
</Concept>

0 commit comments

Comments
 (0)