feat(minfr): add chat models

bene2k1 · bene2k1 · commit c2304f3e5c0b · 2025-04-09T12:01:14.000+02:00
diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx
@@ -10,35 +10,51 @@ dates:
   validation: 2025-04-08
   posted: 2025-04-08
 categories:
-  - ai-data
+   * ai-data
 ---
 
 Scaleway Managed Inference allows you to deploy various AI models, either from:
 
-- [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
-- [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
+ * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models)
+ * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face.
 
 ## Scaleway catalog
 
 ### Multimodal models (chat + vision)
 
 ### Chat models
 
-| Provider | Model identifier | Documentation | License |
-|----------|------------------|----------------|---------|
-| Meta     | `llama-3.3-70b-instruct` | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 License](https://www.llama.com/llama3_3/license/) |
-| Meta     | `llama-3.1-8b-instruct`  | [View Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/)  | [Llama 3.1 License](https://llama.meta.com/llama3_1/license/)  |
+| Provider   | Model identifier                  | Documentation                                                            | License                                                 |
+|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------|
+| Allen AI   | `molmo-72b-0924`                  | [View Details](/managed-inference/reference-content/molmo-72b-0924/)           | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Deepseek   | `deepseek-r1-distill-llama-70b`   | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
+| Deepseek   | `deepseek-r1-distill-llama-8b`    | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/)  | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
+| Meta       | `llama-3-70b-instruct`            | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/)         | [Llama 3 license](https://www.llama.com/llama3/license/)    |
+| Meta       | `llama-3-8b-instruct`             | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/)          | [Llama 3 license](https://www.llama.com/llama3/license/)    |
+| Meta       | `llama-3.1-70b-instruct`          | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/)       | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
+| Meta       | `llama-3.1-8b-instruct`           | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/)        | [Llama 3.1 license](https://www.llama.com/llama3_1/license/)    |
+| Meta       | `llama-3.3-70b-instruct`          | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/)       | [Llama 3.3 license](https://www.llama.com/llama3_3/license/)    |
+| Nvidia     | `llama-3.1-nemotron-70b-instruct`   | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) |
+| Mistral    | `mixtral-8x7b-instruct-v0.1`      | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/)   | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-7b-instruct-v0.3`        | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/)    | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-nemo-instruct-2407`      | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/)  | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Mistral    | `pixtral-12b-2409`                | [View Details](/managed-inference/reference-content/pixtral-12b-2409/)             | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) |
+| Qwen       | `qwen2.5-coder-32b-instruct`      | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/)   | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) |
 
 ### Vision models
 
 _More details to be added._
 
 ### Embedding models
 
-_More details to be added._
+| Provider | Model identifier | Documentation | License |
+|----------|------------------|----------------|---------|
+| BAAI     | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
+| Sentence Transformers     | `sentence-t5-xxl`  | [View Details](/managed-inference/reference-content/sentence-t5-xxl/)  | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0)  |
 
 
-## Custom Models
+## Custom models
 
 <Message type="note">
   Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference).
@@ -56,30 +72,30 @@ To deploy a custom model via Hugging Face, ensure the following:
 
 #### Access requirements
 
-- You must have access to the model using your Hugging Face credentials.
-- For gated models, request access through your Hugging Face account.
-- Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
+ * You must have access to the model using your Hugging Face credentials.
+ * For gated models, request access through your Hugging Face account.
+ * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens).
 
 #### Required files
 
 Your model repository must include:
 
-- `config.json` with:
-  - An `architectures` array (see [supported architectures](#supported-models-architecture))
-  - `max_position_embeddings`
-- Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
-- A chat template included in either:
-  - `tokenizer_config.json` as a `chat_template` field, or
-  - `chat_template.json` as a `chat_template` field
+ * A `config.json` file containig:
+   * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values).
+   * `max_position_embeddings`
+ * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
+ * A chat template included in either:
+   * `tokenizer_config.json` as a `chat_template` field, or
+   * `chat_template.json` as a `chat_template` field
 
 #### Supported model types
 
 Your model must be one of the following types:
 
-- `chat`
-- `vision`
-- `multimodal` (chat + vision)
-- `embedding`
+ * `chat`
+ * `vision`
+ * `multimodal` (chat + vision)
+ * `embedding`
 
 <Message type="important">
   **Security Notice**<br />
@@ -88,16 +104,16 @@ Your model must be one of the following types:
 
 ## API support
 
-Depending on your model type, the following endpoints will be available:
+Depending on the model type, specific endpoints and features will be supported.
 
 ### Chat models
 
-Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
+Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
 **Structured outputs** or **Function calling** are not yet supported for custom models.
 
 ### Vision models
 
-Chat API will be expposed for this model under `/v1/chat/completions` endpoint.
+Chat API will be exposed for this model under `/v1/chat/completions` endpoint.
 **Structured outputs** or **Function calling** are not yet supported for custom models.
 
 ### Multimodal models
@@ -123,6 +139,129 @@ When deploying custom models, **you remain responsible** for complying with any
 Custom models must conform to one of the architectures listed below. Click to expand full list.
 
 <Concept>
-  ## Supported custom model architectures 
-  Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
+  ## Supported custom model architectures
+  Custom models deployment currently supports the following model architectures:
+  * `AquilaModel`
+  * `AquilaForCausalLM`
+  * `ArcticForCausalLM`
+  * `BaiChuanForCausalLM`
+  * `BaichuanForCausalLM`
+  * `BloomForCausalLM`
+  * `CohereForCausalLM`
+  * `Cohere2ForCausalLM`
+  * `DbrxForCausalLM`
+  * `DeciLMForCausalLM`
+  * `DeepseekForCausalLM`
+  * `DeepseekV2ForCausalLM`
+  * `DeepseekV3ForCausalLM`
+  * `ExaoneForCausalLM`
+  * `FalconForCausalLM`
+  * `Fairseq2LlamaForCausalLM`
+  * `GemmaForCausalLM`
+  * `Gemma2ForCausalLM`
+  * `GlmForCausalLM`
+  * `GPT2LMHeadModel`
+  * `GPTBigCodeForCausalLM`
+  * `GPTJForCausalLM`
+  * `GPTNeoXForCausalLM`
+  * `GraniteForCausalLM`
+  * `GraniteMoeForCausalLM`
+  * `GritLM`
+  * `InternLMForCausalLM`
+  * `InternLM2ForCausalLM`
+  * `InternLM2VEForCausalLM`
+  * `InternLM3ForCausalLM`
+  * `JAISLMHeadModel`
+  * `JambaForCausalLM`
+  * `LlamaForCausalLM`
+  * `LLaMAForCausalLM`
+  * `MambaForCausalLM`
+  * `FalconMambaForCausalLM`
+  * `MiniCPMForCausalLM`
+  * `MiniCPM3ForCausalLM`
+  * `MistralForCausalLM`
+  * `MixtralForCausalLM`
+  * `QuantMixtralForCausalLM`
+  * `MptForCausalLM`
+  * `MPTForCausalLM`
+  * `NemotronForCausalLM`
+  * `OlmoForCausalLM`
+  * `Olmo2ForCausalLM`
+  * `OlmoeForCausalLM`
+  * `OPTForCausalLM`
+  * `OrionForCausalLM`
+  * `PersimmonForCausalLM`
+  * `PhiForCausalLM`
+  * `Phi3ForCausalLM`
+  * `Phi3SmallForCausalLM`
+  * `PhiMoEForCausalLM`
+  * `Qwen2ForCausalLM`
+  * `Qwen2MoeForCausalLM`
+  * `RWForCausalLM`
+  * `StableLMEpochForCausalLM`
+  * `StableLmForCausalLM`
+  * `Starcoder2ForCausalLM`
+  * `SolarForCausalLM`
+  * `TeleChat2ForCausalLM`
+  * `XverseForCausalLM`
+  * `BartModel`
+  * `BartForConditionalGeneration`
+  * `Florence2ForConditionalGeneration`
+  * `BertModel`
+  * `RobertaModel`
+  * `RobertaForMaskedLM`
+  * `XLMRobertaModel`
+  * `DeciLMForCausalLM`
+  * `Gemma2Model`
+  * `GlmForCausalLM`
+  * `GritLM`
+  * `InternLM2ForRewardModel`
+  * `JambaForSequenceClassification`
+  * `LlamaModel`
+  * `MistralModel`
+  * `Phi3ForCausalLM`
+  * `Qwen2Model`
+  * `Qwen2ForCausalLM`
+  * `Qwen2ForRewardModel`
+  * `Qwen2ForProcessRewardModel`
+  * `TeleChat2ForCausalLM`
+  * `LlavaNextForConditionalGeneration`
+  * `Phi3VForCausalLM`
+  * `Qwen2VLForConditionalGeneration`
+  * `Qwen2ForSequenceClassification`
+  * `BertForSequenceClassification`
+  * `RobertaForSequenceClassification`
+  * `XLMRobertaForSequenceClassification`
+  * `AriaForConditionalGeneration`
+  * `Blip2ForConditionalGeneration`
+  * `ChameleonForConditionalGeneration`
+  * `ChatGLMModel`
+  * `ChatGLMForConditionalGeneration`
+  * `DeepseekVLV2ForCausalLM`
+  * `FuyuForCausalLM`
+  * `H2OVLChatModel`
+  * `InternVLChatModel`
+  * `Idefics3ForConditionalGeneration`
+  * `LlavaForConditionalGeneration`
+  * `LlavaNextForConditionalGeneration`
+  * `LlavaNextVideoForConditionalGeneration`
+  * `LlavaOnevisionForConditionalGeneration`
+  * `MantisForConditionalGeneration`
+  * `MiniCPMO`
+  * `MiniCPMV`
+  * `MolmoForCausalLM`
+  * `NVLM_D`
+  * `PaliGemmaForConditionalGeneration`
+  * `Phi3VForCausalLM`
+  * `PixtralForConditionalGeneration`
+  * `QWenLMHeadModel`
+  * `Qwen2VLForConditionalGeneration`
+  * `Qwen2_5_VLForConditionalGeneration`
+  * `Qwen2AudioForConditionalGeneration`
+  * `UltravoxModel`
+  * `MllamaForConditionalGeneration`
+  * `WhisperForConditionalGeneration`
+  * `EAGLEModel`
+  * `MedusaModel`
+  * `MLPSpeculatorPreTrainedModel`
 </Concept>