diff --git a/menu/navigation.json b/menu/navigation.json index e4e2247785..e5d1696197 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -860,6 +860,10 @@ "label": "OpenAI API compatibility", "slug": "openai-compatibility" }, + { + "label": "Supported models in Managed Inference", + "slug": "supported-models" + }, { "label": "Support for function calling in Scaleway Managed Inference", "slug": "function-calling-support" diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx index 12a15a8b57..1b43cd5ee8 100644 --- a/pages/managed-inference/how-to/create-deployment.mdx +++ b/pages/managed-inference/how-to/create-deployment.mdx @@ -7,7 +7,7 @@ content: paragraph: This page explains how to deploy a model on Scaleway Managed Inference tags: managed-inference ai-data creating dedicated dates: - validation: 2025-04-01 + validation: 2025-04-09 posted: 2024-03-06 --- @@ -19,7 +19,10 @@ dates: 1. Click the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard. 2. Click **Deploy a model** to launch the model deployment wizard. 3. Provide the necessary information: - - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/) + - Select the desired model and quantization to use for your deployment [from the available options](/managed-inference/reference-content/). + + Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation. + Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly. diff --git a/pages/managed-inference/quickstart.mdx b/pages/managed-inference/quickstart.mdx index 697316fb36..48eb89b0b3 100644 --- a/pages/managed-inference/quickstart.mdx +++ b/pages/managed-inference/quickstart.mdx @@ -38,7 +38,10 @@ Here are some of the key features of Scaleway Managed Inference: 1. Navigate to the **AI & Data** section of the [Scaleway console](https://console.scaleway.com/), and select **Managed Inference** from the side menu to access the Managed Inference dashboard. 2. Click **Create deployment** to launch the deployment creation wizard. 3. Provide the necessary information: - - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/) + - Select the desired model and the quantization to use for your deployment [from the available options](/managed-inference/reference-content/). + + Scaleway Managed Inference allows you to deploy various AI models, either from the Scaleway catalog or by importing a custom model. For detailed information about supported models, visit our [Supported models in Managed Inference](/managed-inference/reference-content/supported-models/) documentation. + Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly. diff --git a/pages/managed-inference/reference-content/supported-models.mdx b/pages/managed-inference/reference-content/supported-models.mdx new file mode 100644 index 0000000000..845be58327 --- /dev/null +++ b/pages/managed-inference/reference-content/supported-models.mdx @@ -0,0 +1,269 @@ +--- +meta: + title: Supported models in Managed Inference + description: Explore all AI models supported by Managed Inference +content: + h1: Supported models in Managed Inference + paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models. +tags: support models custom catalog +dates: + validation: 2025-04-08 + posted: 2025-04-08 +categories: + - ai-data +--- + +Scaleway Managed Inference allows you to deploy various AI models, either from: + + * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) + * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. + +## Scaleway catalog + +### Multimodal models (chat + vision) + +_More details to be added._ + +### Chat models + +| Provider | Model identifier | Documentation | License | +|------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------| +| Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | +| Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | +| Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | +| Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) | +| Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) | +| Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | +| Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | +| Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | + +### Vision models + +_More details to be added._ + +### Embedding models + +| Provider | Model identifier | Documentation | License | +|----------|------------------|----------------|---------| +| BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) | +| Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | + + +## Custom models + + + Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). + + +### Prerequisites + + + We recommend starting with a variation of a supported model from the Scaleway catalog. + For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). + If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. + + +To deploy a custom model via Hugging Face, ensure the following: + +#### Access requirements + + * You must have access to the model using your Hugging Face credentials. + * For gated models, request access through your Hugging Face account. + * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). + +#### Required files + +Your model repository must include: + + * A `config.json` file containig: + * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values). + * `max_position_embeddings` + * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format + * A chat template included in either: + * `tokenizer_config.json` as a `chat_template` field, or + * `chat_template.json` as a `chat_template` field + +#### Supported model types + +Your model must be one of the following types: + + * `chat` + * `vision` + * `multimodal` (chat + vision) + * `embedding` + + + **Security Notice**
+ Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**. +
+ +## API support + +Depending on the model type, specific endpoints and features will be supported. + +### Chat models + +The Chat API will be exposed for this model under `/v1/chat/completions` endpoint. +**Structured outputs** or **Function calling** are not yet supported for custom models. + +### Vision models + +Chat API will be exposed for this model under `/v1/chat/completions` endpoint. +**Structured outputs** or **Function calling** are not yet supported for custom models. + +### Multimodal models + +These models will be treated similarly to both Chat and Vision models. + +### Embedding models + +Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. + + +## Custom model lifecycle + +Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. +In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. + +## Licensing + +When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. + +## Supported model architectures + +Custom models must conform to one of the architectures listed below. Click to expand full list. + + + ## Supported custom model architectures + Custom model deployment currently supports the following model architectures: + * `AquilaModel` + * `AquilaForCausalLM` + * `ArcticForCausalLM` + * `BaiChuanForCausalLM` + * `BaichuanForCausalLM` + * `BloomForCausalLM` + * `CohereForCausalLM` + * `Cohere2ForCausalLM` + * `DbrxForCausalLM` + * `DeciLMForCausalLM` + * `DeepseekForCausalLM` + * `DeepseekV2ForCausalLM` + * `DeepseekV3ForCausalLM` + * `ExaoneForCausalLM` + * `FalconForCausalLM` + * `Fairseq2LlamaForCausalLM` + * `GemmaForCausalLM` + * `Gemma2ForCausalLM` + * `GlmForCausalLM` + * `GPT2LMHeadModel` + * `GPTBigCodeForCausalLM` + * `GPTJForCausalLM` + * `GPTNeoXForCausalLM` + * `GraniteForCausalLM` + * `GraniteMoeForCausalLM` + * `GritLM` + * `InternLMForCausalLM` + * `InternLM2ForCausalLM` + * `InternLM2VEForCausalLM` + * `InternLM3ForCausalLM` + * `JAISLMHeadModel` + * `JambaForCausalLM` + * `LlamaForCausalLM` + * `LLaMAForCausalLM` + * `MambaForCausalLM` + * `FalconMambaForCausalLM` + * `MiniCPMForCausalLM` + * `MiniCPM3ForCausalLM` + * `MistralForCausalLM` + * `MixtralForCausalLM` + * `QuantMixtralForCausalLM` + * `MptForCausalLM` + * `MPTForCausalLM` + * `NemotronForCausalLM` + * `OlmoForCausalLM` + * `Olmo2ForCausalLM` + * `OlmoeForCausalLM` + * `OPTForCausalLM` + * `OrionForCausalLM` + * `PersimmonForCausalLM` + * `PhiForCausalLM` + * `Phi3ForCausalLM` + * `Phi3SmallForCausalLM` + * `PhiMoEForCausalLM` + * `Qwen2ForCausalLM` + * `Qwen2MoeForCausalLM` + * `RWForCausalLM` + * `StableLMEpochForCausalLM` + * `StableLmForCausalLM` + * `Starcoder2ForCausalLM` + * `SolarForCausalLM` + * `TeleChat2ForCausalLM` + * `XverseForCausalLM` + * `BartModel` + * `BartForConditionalGeneration` + * `Florence2ForConditionalGeneration` + * `BertModel` + * `RobertaModel` + * `RobertaForMaskedLM` + * `XLMRobertaModel` + * `DeciLMForCausalLM` + * `Gemma2Model` + * `GlmForCausalLM` + * `GritLM` + * `InternLM2ForRewardModel` + * `JambaForSequenceClassification` + * `LlamaModel` + * `MistralModel` + * `Phi3ForCausalLM` + * `Qwen2Model` + * `Qwen2ForCausalLM` + * `Qwen2ForRewardModel` + * `Qwen2ForProcessRewardModel` + * `TeleChat2ForCausalLM` + * `LlavaNextForConditionalGeneration` + * `Phi3VForCausalLM` + * `Qwen2VLForConditionalGeneration` + * `Qwen2ForSequenceClassification` + * `BertForSequenceClassification` + * `RobertaForSequenceClassification` + * `XLMRobertaForSequenceClassification` + * `AriaForConditionalGeneration` + * `Blip2ForConditionalGeneration` + * `ChameleonForConditionalGeneration` + * `ChatGLMModel` + * `ChatGLMForConditionalGeneration` + * `DeepseekVLV2ForCausalLM` + * `FuyuForCausalLM` + * `H2OVLChatModel` + * `InternVLChatModel` + * `Idefics3ForConditionalGeneration` + * `LlavaForConditionalGeneration` + * `LlavaNextForConditionalGeneration` + * `LlavaNextVideoForConditionalGeneration` + * `LlavaOnevisionForConditionalGeneration` + * `MantisForConditionalGeneration` + * `MiniCPMO` + * `MiniCPMV` + * `MolmoForCausalLM` + * `NVLM_D` + * `PaliGemmaForConditionalGeneration` + * `Phi3VForCausalLM` + * `PixtralForConditionalGeneration` + * `QWenLMHeadModel` + * `Qwen2VLForConditionalGeneration` + * `Qwen2_5_VLForConditionalGeneration` + * `Qwen2AudioForConditionalGeneration` + * `UltravoxModel` + * `MllamaForConditionalGeneration` + * `WhisperForConditionalGeneration` + * `EAGLEModel` + * `MedusaModel` + * `MLPSpeculatorPreTrainedModel` + \ No newline at end of file