docs(inference): update supported model information (#4817)

bene2k1 · fpagny · RoRoJ · web-flow · commit 3edd4f008e58 · 2025-04-11T14:42:29.000+02:00
* feat(inference): add custom models requirements

* feat(inference): add custom model support

* feat(inference): update custom models

* docs(infr): update docs

* feat(minfr): add chat models

* fix(gen): small typo

* feat(inference): update quickstart

* feat(infr): update

* Apply suggestions from code review

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

---------

Co-authored-by: fpagny &lt;franckpagny@hotmail.fr&gt;
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -860,6 +860,10 @@
                     "label": "OpenAI API compatibility",
                     "slug": "openai-compatibility"
                   },
+                  {
+                    "label": "Supported models in Managed Inference",
+                    "slug": "supported-models"
+                  },
                   {
                     "label": "Support for function calling in Scaleway Managed Inference",
                     "slug": "function-calling-support"
diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx
@@ -7,7 +7,7 @@ content:
   paragraph: This page explains how to deploy a model on Scaleway Managed Inference
 tags: managed-inference ai-data creating dedicated
 dates:
-  validation: 2025-04-01
+  validation: 2025-04-09
   posted: 2024-03-06
 ---
 
@@ -19,7 +19,10 @@ dates:
 1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
 2. Click **Deploy a model** to launch the model deployment wizard.
 3. Provide the necessary information:
-    - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
+    - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
+        <Message type="important">
+          Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
+        </Message>
         <Message type="note">
           Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
         </Message>
diff --git a/pages/managed-inference/quickstart.mdx b/pages/managed-inference/quickstart.mdx
@@ -38,7 +38,10 @@ Here are some of the key features of Scaleway Managed Inference:
 1. Navigate to the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
 2. Click **Create deployment** to launch the deployment creation wizard.
 3. Provide the necessary information:
-    - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
+    - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
+        <Message type="important">
+          Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
+        </Message>
         <Message type="note">
           Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
         </Message>
diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx
@@ -0,0 +1,269 @@
+---
+meta:
+  title: Supported models in Managed Inference
+  description: Explore all AI models supported by Managed Inference
+content:
+  h1: Supported models in Managed Inference
+  paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models.
+tags: support models custom catalog
+dates:
+  validation: 2025-04-08
+  posted: 2025-04-08
+categories:
+   - ai-data
+---
+
+Scaleway Managed Inference allows you to deploy various AI models, either from:
+
+ * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
+ * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
+
+## Scaleway catalog
+
+### Multimodal models (chat + vision)
+
+_More details to be added._
+
+### Chat models
+
+| Provider   | Model identifier                  | Documentation                                                            | License                                                 |
+|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------|
+| Allen AI   | `molmo-72b-0924`                  | [View Details](/managed-inference/reference-content/molmo-72b-0924/)           | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Deepseek   | `deepseek-r1-distill-llama-70b`   | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
+| Deepseek   | `deepseek-r1-distill-llama-8b`    | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/)  | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
+| Meta       | `llama-3-70b-instruct`            | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/)         | [Llama 3 license](https://www.llama.com/llama3/license/)    |
+| Meta       | `llama-3-8b-instruct`             | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/)          | [Llama 3 license](https://www.llama.com/llama3/license/)    |
+| Meta       | `llama-3.1-70b-instruct`          | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/)       | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
+| Meta       | `llama-3.1-8b-instruct`           | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/)        | [Llama 3.1 license](https://www.llama.com/llama3_1/license/)    |
+| Meta       | `llama-3.3-70b-instruct`          | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/)       | [Llama 3.3 license](https://www.llama.com/llama3_3/license/)    |
+| Nvidia     | `llama-3.1-nemotron-70b-instruct`   | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
+| Mistral    | `mixtral-8x7b-instruct-v0.1`      | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/)   | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-7b-instruct-v0.3`        | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/)    | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-nemo-instruct-2407`      | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/)  | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `pixtral-12b-2409`                | [View Details](/managed-inference/reference-content/pixtral-12b-2409/)             | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Qwen       | `qwen2.5-coder-32b-instruct`      | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/)   | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) |
+
+### Vision models
+
+_More details to be added._
+
+### Embedding models
+
+| Provider | Model identifier | Documentation | License |
+|----------|------------------|----------------|---------|
+| BAAI     | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
+| Sentence Transformers     | `sentence-t5-xxl`  | [View Details](/managed-inference/reference-content/sentence-t5-xxl/)  | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0)  |
+
+
+## Custom models
+
+<Message type="note">
+  Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
+</Message>
+
+### Prerequisites
+
+<Message type="tip">
+  We recommend starting with a variation of a supported model from the Scaleway catalog.
+  For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit).
+  If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
+</Message>
+
+To deploy a custom model via Hugging Face, ensure the following:
+
+#### Access requirements
+
+ * You must have access to the model using your Hugging Face credentials.
+ * For gated models, request access through your Hugging Face account.
+ * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
+
+#### Required files
+
+Your model repository must include:
+
+ * A `config.json` file containig:
+   * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values).
+   * `max_position_embeddings`
+ * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
+ * A chat template included in either:
+   * `tokenizer_config.json` as a `chat_template` field, or
+   * `chat_template.json` as a `chat_template` field
+
+#### Supported model types
+
+Your model must be one of the following types:
+
+ * `chat`
+ * `vision`
+ * `multimodal` (chat + vision)
+ * `embedding`
+
+<Message type="important">
+  **Security Notice**<br />
+  Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
+</Message>
+
+## API support
+
+Depending on the model type, specific endpoints and features will be supported.
+
+### Chat models
+
+The Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
+**Structured outputs** or **Function calling** are not yet supported for custom models.
+
+### Vision models
+
+Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
+**Structured outputs** or **Function calling** are not yet supported for custom models.
+
+### Multimodal models
+
+These models will be treated similarly to both Chat and Vision models.
+
+### Embedding models
+
+Embeddings API will be exposed for this model under `/v1/embeddings` endpoint.
+
+
+## Custom model lifecycle
+
+Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments.
+In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**.
+
+## Licensing
+
+When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
+
+## Supported model architectures
+
+Custom models must conform to one of the architectures listed below. Click to expand full list.
+
+<Concept>
+  ## Supported custom model architectures
+  Custom model deployment currently supports the following model architectures:
+  * `AquilaModel`
+  * `AquilaForCausalLM`
+  * `ArcticForCausalLM`
+  * `BaiChuanForCausalLM`
+  * `BaichuanForCausalLM`
+  * `BloomForCausalLM`
+  * `CohereForCausalLM`
+  * `Cohere2ForCausalLM`
+  * `DbrxForCausalLM`
+  * `DeciLMForCausalLM`
+  * `DeepseekForCausalLM`
+  * `DeepseekV2ForCausalLM`
+  * `DeepseekV3ForCausalLM`
+  * `ExaoneForCausalLM`
+  * `FalconForCausalLM`
+  * `Fairseq2LlamaForCausalLM`
+  * `GemmaForCausalLM`
+  * `Gemma2ForCausalLM`
+  * `GlmForCausalLM`
+  * `GPT2LMHeadModel`
+  * `GPTBigCodeForCausalLM`
+  * `GPTJForCausalLM`
+  * `GPTNeoXForCausalLM`
+  * `GraniteForCausalLM`
+  * `GraniteMoeForCausalLM`
+  * `GritLM`
+  * `InternLMForCausalLM`
+  * `InternLM2ForCausalLM`
+  * `InternLM2VEForCausalLM`
+  * `InternLM3ForCausalLM`
+  * `JAISLMHeadModel`
+  * `JambaForCausalLM`
+  * `LlamaForCausalLM`
+  * `LLaMAForCausalLM`
+  * `MambaForCausalLM`
+  * `FalconMambaForCausalLM`
+  * `MiniCPMForCausalLM`
+  * `MiniCPM3ForCausalLM`
+  * `MistralForCausalLM`
+  * `MixtralForCausalLM`
+  * `QuantMixtralForCausalLM`
+  * `MptForCausalLM`
+  * `MPTForCausalLM`
+  * `NemotronForCausalLM`
+  * `OlmoForCausalLM`
+  * `Olmo2ForCausalLM`
+  * `OlmoeForCausalLM`
+  * `OPTForCausalLM`
+  * `OrionForCausalLM`
+  * `PersimmonForCausalLM`
+  * `PhiForCausalLM`
+  * `Phi3ForCausalLM`
+  * `Phi3SmallForCausalLM`
+  * `PhiMoEForCausalLM`
+  * `Qwen2ForCausalLM`
+  * `Qwen2MoeForCausalLM`
+  * `RWForCausalLM`
+  * `StableLMEpochForCausalLM`
+  * `StableLmForCausalLM`
+  * `Starcoder2ForCausalLM`
+  * `SolarForCausalLM`
+  * `TeleChat2ForCausalLM`
+  * `XverseForCausalLM`
+  * `BartModel`
+  * `BartForConditionalGeneration`
+  * `Florence2ForConditionalGeneration`
+  * `BertModel`
+  * `RobertaModel`
+  * `RobertaForMaskedLM`
+  * `XLMRobertaModel`
+  * `DeciLMForCausalLM`
+  * `Gemma2Model`
+  * `GlmForCausalLM`
+  * `GritLM`
+  * `InternLM2ForRewardModel`
+  * `JambaForSequenceClassification`
+  * `LlamaModel`
+  * `MistralModel`
+  * `Phi3ForCausalLM`
+  * `Qwen2Model`
+  * `Qwen2ForCausalLM`
+  * `Qwen2ForRewardModel`
+  * `Qwen2ForProcessRewardModel`
+  * `TeleChat2ForCausalLM`
+  * `LlavaNextForConditionalGeneration`
+  * `Phi3VForCausalLM`
+  * `Qwen2VLForConditionalGeneration`
+  * `Qwen2ForSequenceClassification`
+  * `BertForSequenceClassification`
+  * `RobertaForSequenceClassification`
+  * `XLMRobertaForSequenceClassification`
+  * `AriaForConditionalGeneration`
+  * `Blip2ForConditionalGeneration`
+  * `ChameleonForConditionalGeneration`
+  * `ChatGLMModel`
+  * `ChatGLMForConditionalGeneration`
+  * `DeepseekVLV2ForCausalLM`
+  * `FuyuForCausalLM`
+  * `H2OVLChatModel`
+  * `InternVLChatModel`
+  * `Idefics3ForConditionalGeneration`
+  * `LlavaForConditionalGeneration`
+  * `LlavaNextForConditionalGeneration`
+  * `LlavaNextVideoForConditionalGeneration`
+  * `LlavaOnevisionForConditionalGeneration`
+  * `MantisForConditionalGeneration`
+  * `MiniCPMO`
+  * `MiniCPMV`
+  * `MolmoForCausalLM`
+  * `NVLM_D`
+  * `PaliGemmaForConditionalGeneration`
+  * `Phi3VForCausalLM`
+  * `PixtralForConditionalGeneration`
+  * `QWenLMHeadModel`
+  * `Qwen2VLForConditionalGeneration`
+  * `Qwen2_5_VLForConditionalGeneration`
+  * `Qwen2AudioForConditionalGeneration`
+  * `UltravoxModel`
+  * `MllamaForConditionalGeneration`
+  * `WhisperForConditionalGeneration`
+  * `EAGLEModel`
+  * `MedusaModel`
+  * `MLPSpeculatorPreTrainedModel`
+</Concept>