docs(infr): update docs

bene2k1 · bene2k1 · commit 57a0ac2a4190 · 2025-04-09T11:16:52.000+02:00
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -860,6 +860,10 @@
                     "label": "OpenAI API compatibility",
                     "slug": "openai-compatibility"
                   },
+                  {
+                    "label": "Supported models in Managed Inference",
+                    "slug": "supported-models"
+                  },
                   {
                     "label": "Support for function calling in Scaleway Managed Inference",
                     "slug": "function-calling-support"
diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx
@@ -7,7 +7,7 @@ content:
   paragraph: This page explains how to deploy a model on Scaleway Managed Inference
 tags: managed-inference ai-data creating dedicated
 dates:
-  validation: 2025-04-01
+  validation: 2025-04-09
   posted: 2024-03-06
 ---
 
@@ -19,7 +19,10 @@ dates:
 1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard.
 2. Click **Deploy a model** to launch the model deployment wizard.
 3. Provide the necessary information:
-    - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/)
+    - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/).
+        <Message type="important">
+          Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation.
+        </Message>
         <Message type="note">
           Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly.
         </Message>
diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx
@@ -1,105 +1,128 @@
 ---
 meta:
-  title: Supported Models in Managed Inference
-  description: Supported Models in Managed Inference
+  title: Supported models in Managed Inference
+  description: Explore all AI models supported by Managed Inference
 content:
-  h1: Supported Models in Managed Inference
-  paragraph: Supported Models in Managed Inference
-tags:
+  h1: Supported models in Managed Inference
+  paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models.
+tags: support models custom catalog
 dates:
   validation: 2025-04-08
   posted: 2025-04-08
 categories:
   - ai-data
 ---
 
-## Models supported on Managed Inference
+Scaleway Managed Inference allows you to deploy various AI models, either from:
 
-Managed Inference supports multiple AI models either from:
-- [Scaleway catalog]((#scaleway-catalog)): A curated model list available in [Scaleway Console](https://console.scaleway.com/inference/deployments/) or through [Managed Inference Models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
-- [Custom models](#custom-models): Models imported by you as a user from sources such as HuggingFace.
+- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
+- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
 
-## Scaleway Catalog
+## Scaleway catalog
 
-### Multimodal models (Chat and Vision)
+### Multimodal models (chat + vision)
 
 ### Chat models
 
-| Provider | Model string | Documentation | License |
-|-----------------|-----------------|-----------------|-----------------|
-| Meta        | `llama-3.3-70b-instruct`  | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) |
-| Meta        | `llama-3.1-8b-instruct`  | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) |
+| Provider | Model identifier | Documentation | License |
+|----------|------------------|----------------|---------|
+| Meta     | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) |
+| Meta     | `llama-3.1-8b-instruct`  | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/)  | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/)  |
 
 ### Vision models
 
+_More details to be added._
+
 ### Embedding models
 
-## Custom models
+_More details to be added._
+
+
+## Custom Models
 
 <Message type="note">
-  Custom models are still in Beta status. If you identify unsupported models, you can report the issue to us through our [Slack Community Channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or our [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
+  Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
 </Message>
 
-### Prerequesites
+### Prerequisites
 
 <Message type="tip">
-  To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment.
+  We recommend starting with a variation of a supported model from the Scaleway catalog.
+  For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit).
+  If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above.
 </Message>
 
-To deploy a model by providing its URL on Hugging Face, you need to:
-- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose. 
+To deploy a custom model via Hugging Face, ensure the following:
+
+#### Access requirements
+
+- You must have access to the model using your Hugging Face credentials.
+- For gated models, request access through your Hugging Face account.
+- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
+
+#### Required files
+
+Your model repository must include:
 
-The model files need to include:
-- a `config.json` file containing:
-  - `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values.
+- `config.json` with:
+  - An `architectures` array (see [supported architectures](#supported-models-architecture))
   - `max_position_embeddings`
-- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
-- a chat template either in:
-  - `tokenizer_config.json` file as `chat_template` field
-  - `chat_template.json` file as `chat_template` field
+- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
+- A chat template included in either:
+  - `tokenizer_config.json` as a `chat_template` field, or
+  - `chat_template.json` as a `chat_template` field
+
+#### Supported model types
+
+Your model must be one of the following types:
 
-The model type need to either be:
 - `chat`
 - `vision`
-- `multimodal` (`chat` and `vision` currently)
+- `multimodal` (chat + vision)
 - `embedding`
 
-For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported. 
+<Message type="important">
+  **Security Notice**<br />
+  Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**.
+</Message>
 
-### Supported API
+## API support
 
-Depending on the model type, specific endpoints and features will be supported.
+Depending on your model type, the following endpoints will be available:
 
-#### Chat models
+### Chat models
 
 Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
 **Structured outputs** or **Function calling** are not yet supported for custom models.
 
-#### Vision models
+### Vision models
 
 Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
 **Structured outputs** or **Function calling** are not yet supported for custom models.
 
-#### Multimodal models (vision and chat)
+### Multimodal models
 
 These models will be treated similarly to both Chat and Vision models.
 
-#### Embedding models
+### Embedding models
 
 Embeddings API will be exposed for this model under `/v1/embeddings` endpoint.
 
 
-### Custom model lifecycle
+## Custom model lifecycle
 
 Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments.
-In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
-
-### License
+In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**.
 
-- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
+## Licensing
 
-### Supported models architecture
+When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
 
-Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
+## Supported model architectures
 
+Custom models must conform to one of the architectures listed below. Click to expand full list.
 
+<Concept>
+  ## Supported custom model architectures 
+  Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
+</Concept>