-
Notifications
You must be signed in to change notification settings - Fork 258
docs(inference): update supported model information #4817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 8 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
aab809b
feat(inference): add custom models requirements
fpagny 7dcfe6e
feat(inference): add custom model support
fpagny 1386fd2
feat(inference): update custom models
fpagny 57a0ac2
docs(infr): update docs
bene2k1 c2304f3
feat(minfr): add chat models
bene2k1 23659e8
fix(gen): small typo
bene2k1 91c7e33
feat(inference): update quickstart
bene2k1 c6d5af7
feat(infr): update
bene2k1 f7b9f5a
Apply suggestions from code review
bene2k1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
269 changes: 269 additions & 0 deletions
269
pages/managed-inference/reference-content/supported-models.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,269 @@ | ||
| --- | ||
| meta: | ||
| title: Supported models in Managed Inference | ||
| description: Explore all AI models supported by Managed Inference | ||
| content: | ||
| h1: Supported models in Managed Inference | ||
| paragraph: Discover which AI models you can deploy using Managed Inference, either from the Scaleway Catalog or as custom models. | ||
| tags: support models custom catalog | ||
| dates: | ||
| validation: 2025-04-08 | ||
| posted: 2025-04-08 | ||
| categories: | ||
| - ai-data | ||
| --- | ||
|
|
||
| Scaleway Managed Inference allows you to deploy various AI models, either from: | ||
|
|
||
| * [Scaleway catalog](#scaleway-catalog): A curated set of ready-to-deploy models available through the [Scaleway console](https://console.scaleway.com/inference/deployments/) or the [Managed Inference models API](https://www.scaleway.com/en/developers/api/inference/#path-models-list-models) | ||
| * [Custom models](#custom-models): Models that you import, typically from sources like Hugging Face. | ||
|
|
||
| ## Scaleway catalog | ||
|
|
||
| ### Multimodal models (chat + vision) | ||
|
|
||
| _More details to be added._ | ||
|
|
||
| ### Chat models | ||
|
|
||
| | Provider | Model identifier | Documentation | License | | ||
| |------------|-----------------------------------|--------------------------------------------------------------------------|-------------------------------------------------------| | ||
| | Allen AI | `molmo-72b-0924` | [View Details](/managed-inference/reference-content/molmo-72b-0924/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Deepseek | `deepseek-r1-distill-llama-70b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-70b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | | ||
| | Deepseek | `deepseek-r1-distill-llama-8b` | [View Details](/managed-inference/reference-content/deepseek-r1-distill-llama-8b/) | [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | | ||
| | Meta | `llama-3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3-70b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | | ||
| | Meta | `llama-3-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3-8b-instruct/) | [Llama 3 license](https://www.llama.com/llama3/license/) | | ||
| | Meta | `llama-3.1-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-70b-instruct/) | [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | | ||
| | Meta | `llama-3.1-8b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 license](https://www.llama.com/llama3_1/license/) | | ||
| | Meta | `llama-3.3-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 license](https://www.llama.com/llama3_3/license/) | | ||
| | Nvidia | `llama-3.1-nemotron-70b-instruct` | [View Details](/managed-inference/reference-content/llama-3.1-nemotron-70b-instruct/)| [Llama 3.1 community license](https://www.llama.com/llama3_1/license/) | | ||
| | Mistral | `mixtral-8x7b-instruct-v0.1` | [View Details](/managed-inference/reference-content/mixtral-8x7b-instruct-v0.1/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Mistral | `mistral-7b-instruct-v0.3` | [View Details](/managed-inference/reference-content/mistral-7b-instruct-v0.3/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Mistral | `mistral-nemo-instruct-2407` | [View Details](/managed-inference/reference-content/mistral-nemo-instruct-2407/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Mistral | `mistral-small-24b-instruct-2501` | [View Details](/managed-inference/reference-content/mistral-small-24b-instruct-2501/)| [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Mistral | `pixtral-12b-2409` | [View Details](/managed-inference/reference-content/pixtral-12b-2409/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
| | Qwen | `qwen2.5-coder-32b-instruct` | [View Details](/managed-inference/reference-content/qwen2.5-coder-32b-instruct/) | [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | | ||
|
|
||
| ### Vision models | ||
|
|
||
| _More details to be added._ | ||
|
|
||
| ### Embedding models | ||
|
|
||
| | Provider | Model identifier | Documentation | License | | ||
| |----------|------------------|----------------|---------| | ||
| | BAAI | `bge-multilingual-gemma2` | [View Details](/managed-inference/reference-content/bge-multilingual-gemma2/) | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) | | ||
| | Sentence Transformers | `sentence-t5-xxl` | [View Details](/managed-inference/reference-content/sentence-t5-xxl/) | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) | | ||
|
|
||
|
|
||
| ## Custom models | ||
|
|
||
| <Message type="note"> | ||
| Custom model support is currently in **beta**. If you encounter issues or limitations, please report them via our [Slack community channel](https://scaleway-community.slack.com/archives/C01SGLGRLEA) or [customer support](https://console.scaleway.com/support/tickets/create?for=product&productName=inference). | ||
| </Message> | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| <Message type="tip"> | ||
| We recommend starting with a variation of a supported model from the Scaleway catalog. | ||
| For example, you can deploy a [quantized (4-bit) version of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). | ||
| If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. | ||
| </Message> | ||
|
|
||
| To deploy a custom model via Hugging Face, ensure the following: | ||
|
|
||
| #### Access requirements | ||
|
|
||
| * You must have access to the model using your Hugging Face credentials. | ||
| * For gated models, request access through your Hugging Face account. | ||
| * Credentials are not stored, but we recommend using [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens). | ||
|
|
||
| #### Required files | ||
|
|
||
| Your model repository must include: | ||
|
|
||
| * A `config.json` file containig: | ||
| * An `architectures` array (see [supported architectures](#supported-models-architecture) for the exact list of supported values). | ||
| * `max_position_embeddings` | ||
| * Model weights in the [`.safetensors`](https://huggingface.co/docs/safetensors/index) format | ||
| * A chat template included in either: | ||
| * `tokenizer_config.json` as a `chat_template` field, or | ||
| * `chat_template.json` as a `chat_template` field | ||
|
|
||
| #### Supported model types | ||
|
|
||
| Your model must be one of the following types: | ||
|
|
||
| * `chat` | ||
| * `vision` | ||
| * `multimodal` (chat + vision) | ||
| * `embedding` | ||
|
|
||
| <Message type="important"> | ||
| **Security Notice**<br /> | ||
| Models using formats that allow arbitrary code execution, such as Python [`pickle`](https://docs.python.org/3/library/pickle.html), are **not supported**. | ||
| </Message> | ||
|
|
||
| ## API support | ||
|
|
||
| Depending on the model type, specific endpoints and features will be supported. | ||
|
|
||
| ### Chat models | ||
|
|
||
| Chat API will be exposed for this model under `/v1/chat/completions` endpoint. | ||
| **Structured outputs** or **Function calling** are not yet supported for custom models. | ||
|
|
||
| ### Vision models | ||
|
|
||
| Chat API will be exposed for this model under `/v1/chat/completions` endpoint. | ||
| **Structured outputs** or **Function calling** are not yet supported for custom models. | ||
|
|
||
| ### Multimodal models | ||
|
|
||
| These models will be treated similarly to both Chat and Vision models. | ||
|
|
||
| ### Embedding models | ||
|
|
||
| Embeddings API will be exposed for this model under `/v1/embeddings` endpoint. | ||
|
|
||
|
|
||
| ## Custom model lifecycle | ||
|
|
||
| Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments. | ||
bene2k1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| In case of breaking changes, leading to some custom models not being supported anymore, we will notify you **at least 3 months beforehand**. | ||
|
|
||
| ## Licensing | ||
|
|
||
| When deploying custom models, **you remain responsible** for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU. | ||
|
|
||
| ## Supported model architectures | ||
|
|
||
| Custom models must conform to one of the architectures listed below. Click to expand full list. | ||
|
|
||
| <Concept> | ||
| ## Supported custom model architectures | ||
| Custom models deployment currently supports the following model architectures: | ||
bene2k1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `AquilaModel` | ||
| * `AquilaForCausalLM` | ||
| * `ArcticForCausalLM` | ||
| * `BaiChuanForCausalLM` | ||
| * `BaichuanForCausalLM` | ||
| * `BloomForCausalLM` | ||
| * `CohereForCausalLM` | ||
| * `Cohere2ForCausalLM` | ||
| * `DbrxForCausalLM` | ||
| * `DeciLMForCausalLM` | ||
| * `DeepseekForCausalLM` | ||
| * `DeepseekV2ForCausalLM` | ||
| * `DeepseekV3ForCausalLM` | ||
| * `ExaoneForCausalLM` | ||
| * `FalconForCausalLM` | ||
| * `Fairseq2LlamaForCausalLM` | ||
| * `GemmaForCausalLM` | ||
| * `Gemma2ForCausalLM` | ||
| * `GlmForCausalLM` | ||
| * `GPT2LMHeadModel` | ||
| * `GPTBigCodeForCausalLM` | ||
| * `GPTJForCausalLM` | ||
| * `GPTNeoXForCausalLM` | ||
| * `GraniteForCausalLM` | ||
| * `GraniteMoeForCausalLM` | ||
| * `GritLM` | ||
| * `InternLMForCausalLM` | ||
| * `InternLM2ForCausalLM` | ||
| * `InternLM2VEForCausalLM` | ||
| * `InternLM3ForCausalLM` | ||
| * `JAISLMHeadModel` | ||
| * `JambaForCausalLM` | ||
| * `LlamaForCausalLM` | ||
| * `LLaMAForCausalLM` | ||
| * `MambaForCausalLM` | ||
| * `FalconMambaForCausalLM` | ||
| * `MiniCPMForCausalLM` | ||
| * `MiniCPM3ForCausalLM` | ||
| * `MistralForCausalLM` | ||
| * `MixtralForCausalLM` | ||
| * `QuantMixtralForCausalLM` | ||
| * `MptForCausalLM` | ||
| * `MPTForCausalLM` | ||
| * `NemotronForCausalLM` | ||
| * `OlmoForCausalLM` | ||
| * `Olmo2ForCausalLM` | ||
| * `OlmoeForCausalLM` | ||
| * `OPTForCausalLM` | ||
| * `OrionForCausalLM` | ||
| * `PersimmonForCausalLM` | ||
| * `PhiForCausalLM` | ||
| * `Phi3ForCausalLM` | ||
| * `Phi3SmallForCausalLM` | ||
| * `PhiMoEForCausalLM` | ||
| * `Qwen2ForCausalLM` | ||
| * `Qwen2MoeForCausalLM` | ||
| * `RWForCausalLM` | ||
| * `StableLMEpochForCausalLM` | ||
| * `StableLmForCausalLM` | ||
| * `Starcoder2ForCausalLM` | ||
| * `SolarForCausalLM` | ||
| * `TeleChat2ForCausalLM` | ||
| * `XverseForCausalLM` | ||
| * `BartModel` | ||
| * `BartForConditionalGeneration` | ||
| * `Florence2ForConditionalGeneration` | ||
| * `BertModel` | ||
| * `RobertaModel` | ||
| * `RobertaForMaskedLM` | ||
| * `XLMRobertaModel` | ||
| * `DeciLMForCausalLM` | ||
| * `Gemma2Model` | ||
| * `GlmForCausalLM` | ||
| * `GritLM` | ||
| * `InternLM2ForRewardModel` | ||
| * `JambaForSequenceClassification` | ||
| * `LlamaModel` | ||
| * `MistralModel` | ||
| * `Phi3ForCausalLM` | ||
| * `Qwen2Model` | ||
| * `Qwen2ForCausalLM` | ||
| * `Qwen2ForRewardModel` | ||
| * `Qwen2ForProcessRewardModel` | ||
| * `TeleChat2ForCausalLM` | ||
| * `LlavaNextForConditionalGeneration` | ||
| * `Phi3VForCausalLM` | ||
| * `Qwen2VLForConditionalGeneration` | ||
| * `Qwen2ForSequenceClassification` | ||
| * `BertForSequenceClassification` | ||
| * `RobertaForSequenceClassification` | ||
| * `XLMRobertaForSequenceClassification` | ||
| * `AriaForConditionalGeneration` | ||
| * `Blip2ForConditionalGeneration` | ||
| * `ChameleonForConditionalGeneration` | ||
| * `ChatGLMModel` | ||
| * `ChatGLMForConditionalGeneration` | ||
| * `DeepseekVLV2ForCausalLM` | ||
| * `FuyuForCausalLM` | ||
| * `H2OVLChatModel` | ||
| * `InternVLChatModel` | ||
| * `Idefics3ForConditionalGeneration` | ||
| * `LlavaForConditionalGeneration` | ||
| * `LlavaNextForConditionalGeneration` | ||
| * `LlavaNextVideoForConditionalGeneration` | ||
| * `LlavaOnevisionForConditionalGeneration` | ||
| * `MantisForConditionalGeneration` | ||
| * `MiniCPMO` | ||
| * `MiniCPMV` | ||
| * `MolmoForCausalLM` | ||
| * `NVLM_D` | ||
| * `PaliGemmaForConditionalGeneration` | ||
| * `Phi3VForCausalLM` | ||
| * `PixtralForConditionalGeneration` | ||
| * `QWenLMHeadModel` | ||
| * `Qwen2VLForConditionalGeneration` | ||
| * `Qwen2_5_VLForConditionalGeneration` | ||
| * `Qwen2AudioForConditionalGeneration` | ||
| * `UltravoxModel` | ||
| * `MllamaForConditionalGeneration` | ||
| * `WhisperForConditionalGeneration` | ||
| * `EAGLEModel` | ||
| * `MedusaModel` | ||
| * `MLPSpeculatorPreTrainedModel` | ||
| </Concept> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.