Skip to content

Commit 57a0ac2

Browse files
committed
docs(infr): update docs
1 parent 1386fd2 commit 57a0ac2

File tree

3 files changed

+76
-46
lines changed

3 files changed

+76
-46
lines changed

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -860,6 +860,10 @@
860860
"label": "OpenAI API compatibility",
861861
"slug": "openai-compatibility"
862862
},
863+
{
864+
"label": "Supported models in Managed Inference",
865+
"slug": "supported-models"
866+
},
863867
{
864868
"label": "Support for function calling in Scaleway Managed Inference",
865869
"slug": "function-calling-support"

pages/managed-inference/how-to/create-deployment.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ content:
77
paragraph: This page explains how to deploy a model on Scaleway Managed Inference
88
tags: managed-inference ai-data creating dedicated
99
dates:
10-
validation: 2025-04-01
10+
validation: 2025-04-09
1111
posted: 2024-03-06
1212
---
1313

@@ -19,7 +19,10 @@ dates:
1919
1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
2020
2. Click **Deploy a model** to launch the model deployment wizard.
2121
3. Provide the necessary information:
22-
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
22+
- Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
23+
<Message type="important">
24+
Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
25+
</Message>
2326
<Message type="note">
2427
Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
2528
</Message>
Lines changed: 67 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,128 @@
11
---
22
meta:
3-
title: Supported Models in Managed Inference
4-
description: Supported Models in Managed Inference
3+
title: Supported models in Managed Inference
4+
description: Explore all AI models supported by Managed Inference
55
content:
6-
h1: Supported Models in Managed Inference
7-
paragraph: Supported Models in Managed Inference
8-
tags:
6+
h1: Supported models in Managed Inference
7+
paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models.
8+
tags: support models custom catalog
99
dates:
1010
validation: 2025-04-08
1111
posted: 2025-04-08
1212
categories:
1313
- ai-data
1414
---
1515

16-
## Models supported on Managed Inference
16+
Scaleway Managed Inference allows you to deploy various AI models, either from:
1717

18-
Managed Inference supports multiple AI models either from:
19-
- [Scaleway catalog]((#scaleway-catalog)): A curated model list available in [Scaleway Console](https://console.scaleway.com/inference/deployments/) or through [Managed Inference Models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
20-
- [Custom models](#custom-models): Models imported by you as a user from sources such as HuggingFace.
18+
- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
19+
- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
2120

22-
## Scaleway Catalog
21+
## Scaleway catalog
2322

24-
### Multimodal models (Chat and Vision)
23+
### Multimodal models (chat + vision)
2524

2625
### Chat models
2726

28-
| Provider | Model string | Documentation | License |
29-
|-----------------|-----------------|-----------------|-----------------|
30-
| Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) |
31-
| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) |
27+
| Provider | Model identifier | Documentation | License |
28+
|----------|------------------|----------------|---------|
29+
| Meta | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) |
30+
| Meta | `llama-3.1-8b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/) |
3231

3332
### Vision models
3433

34+
_More details to be added._
35+
3536
### Embedding models
3637

37-
## Custom models
38+
_More details to be added._
39+
40+
41+
## Custom Models
3842

3943
<Message type="note">
40-
Custom models are still in Beta status. If you identify unsupported models, you can report the issue to us through our [Slack Community Channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or our [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
44+
Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
4145
</Message>
4246

43-
### Prerequesites
47+
### Prerequisites
4448

4549
<Message type="tip">
46-
To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment.
50+
We recommend starting with a variation of a supported model from the Scaleway catalog.
51+
For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit).
52+
If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
4753
</Message>
4854

49-
To deploy a model by providing its URL on Hugging Face, you need to:
50-
- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose.
55+
To deploy a custom model via Hugging Face, ensure the following:
56+
57+
#### Access requirements
58+
59+
- You must have access to the model using your Hugging Face credentials.
60+
- For gated models, request access through your Hugging Face account.
61+
- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
62+
63+
#### Required files
64+
65+
Your model repository must include:
5166

52-
The model files need to include:
53-
- a `config.json` file containing:
54-
- `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values.
67+
- `config.json` with:
68+
- An `architectures` array (see [supported architectures](#supported-models-architecture))
5569
- `max_position_embeddings`
56-
- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
57-
- a chat template either in:
58-
- `tokenizer_config.json` file as `chat_template` field
59-
- `chat_template.json` file as `chat_template` field
70+
- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
71+
- A chat template included in either:
72+
- `tokenizer_config.json` as a `chat_template` field, or
73+
- `chat_template.json` as a `chat_template` field
74+
75+
#### Supported model types
76+
77+
Your model must be one of the following types:
6078

61-
The model type need to either be:
6279
- `chat`
6380
- `vision`
64-
- `multimodal` (`chat` and `vision` currently)
81+
- `multimodal` (chat + vision)
6582
- `embedding`
6683

67-
For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported.
84+
<Message type="important">
85+
**Security Notice**<br />
86+
Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
87+
</Message>
6888

69-
### Supported API
89+
## API support
7090

71-
Depending on the model type, specific endpoints and features will be supported.
91+
Depending on your model type, the following endpoints will be available:
7292

73-
#### Chat models
93+
### Chat models
7494

7595
Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
7696
**Structured outputs** or **Function calling** are not yet supported for custom models.
7797

78-
#### Vision models
98+
### Vision models
7999

80100
Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
81101
**Structured outputs** or **Function calling** are not yet supported for custom models.
82102

83-
#### Multimodal models (vision and chat)
103+
### Multimodal models
84104

85105
These models will be treated similarly to both Chat and Vision models.
86106

87-
#### Embedding models
107+
### Embedding models
88108

89109
Embeddings API will be exposed for this model under `/v1/embeddings` endpoint.
90110

91111

92-
### Custom model lifecycle
112+
## Custom model lifecycle
93113

94114
Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments.
95-
In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
96-
97-
### License
115+
In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**.
98116

99-
- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
117+
## Licensing
100118

101-
### Supported models architecture
119+
When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
102120

103-
Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
121+
## Supported model architectures
104122

123+
Custom models must conform to one of the architectures listed below. Click to expand full list.
105124

125+
<Concept>
126+
## Supported custom model architectures
127+
Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
128+
</Concept>

0 commit comments

Comments
 (0)