Skip to content

Commit 7dcfe6e

Browse files
authored
feat(inference): add custom model support
1 parent aab809b commit 7dcfe6e

File tree

1 file changed

+28
-4
lines changed

1 file changed

+28
-4
lines changed

pages/managed-inference/reference-content/supported-models.mdx

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Managed Inference supports multiple AI models either from:
2828
| Provider | Model string | Documentation | License |
2929
|-----------------|-----------------|-----------------|-----------------|
3030
| Meta | `llama-3.3-70b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.3-70b-instruct/) | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) |
31-
| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [HF](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
31+
| Meta | `llama-3.1-8b-instruct` | [Details](https://www.scaleway.com/en/docs/managed-inference/reference-content/llama-3.1-8b-instruct/) | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) |
3232

3333
### Vision models
3434

@@ -42,11 +42,35 @@ Managed Inference supports multiple AI models either from:
4242

4343
### Prerequesites
4444

45+
<Message type="tip">
46+
To begin with custom models deployment, we recommend you start with existing variation of models supported in the Scaleway Catalog. As an example, you can deploy a [quantized version (4 bits) of Llama 3.3](https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-bnb-4bit). If you want to then deploy a fine-tuned version of Llama 3.3, you can ensure the file structure you provide matches this example before creating your deployment.
47+
</Message>
48+
4549
To deploy a model by providing its URL on Hugging Face, you need to:
4650
- Have access to this model with your Hugging Face credentials (if the model is "Gated", you specifically need to ask access from your Hugging Face account). Note that your Hugging Face credentials will not be stored, but we still recommend you to create [read or fine-grained access tokens](https://huggingface.co/docs/hub/security-tokens) for this purpose.
4751

48-
### Additional consideration
52+
The model files need to include:
53+
- a `config.json` file containing:
54+
- `architectures` array. See [supported models architectures](#supported-models-architecture) for exact list of supported values.
55+
- `max_position_embeddings`
56+
- model weigths in [`.safetensors`](https://huggingface.co/docs/safetensors/index) format
57+
- a chat template either in:
58+
- `tokenizer_config.json` file as `chat_template` field
59+
- `chat_template.json` file as `chat_template` field
60+
61+
For security reasons, models containing arbitrary code execution such as [`pickle`](https://docs.python.org/3/library/pickle.html) format are not supported.
62+
63+
### Custom model lifecycle
64+
65+
Currently, custom model deployments are considered to be valid for a long term, and we will ensure any updatse or changes to Managed Inference will not impact existing deployments.
66+
In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
67+
68+
### License
69+
70+
- When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
71+
72+
### Supported models architecture
73+
74+
Custom Models Deployments currently support the following models architecture: `AquilaModel`, `AquilaForCausalLM`, `ArcticForCausalLM`, `BaiChuanForCausalLM`, `BaichuanForCausalLM`, `BloomForCausalLM`, `CohereForCausalLM`, `Cohere2ForCausalLM`, `DbrxForCausalLM`, `DeciLMForCausalLM`, `DeepseekForCausalLM`, `DeepseekV2ForCausalLM`, `DeepseekV3ForCausalLM`, `ExaoneForCausalLM`, `FalconForCausalLM`, `Fairseq2LlamaForCausalLM`, `GemmaForCausalLM`, `Gemma2ForCausalLM`, `GlmForCausalLM`, `GPT2LMHeadModel`, `GPTBigCodeForCausalLM`, `GPTJForCausalLM`, `GPTNeoXForCausalLM`, `GraniteForCausalLM`, `GraniteMoeForCausalLM`, `GritLM`, `InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM2VEForCausalLM`, `InternLM3ForCausalLM`, `JAISLMHeadModel`, `JambaForCausalLM`, `LlamaForCausalLM`, `LLaMAForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`, `MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`, `MistralForCausalLM`, `MixtralForCausalLM`, `QuantMixtralForCausalLM`, `MptForCausalLM`, `MPTForCausalLM`, `NemotronForCausalLM`, `OlmoForCausalLM`, `Olmo2ForCausalLM`, `OlmoeForCausalLM`, `OPTForCausalLM`, `OrionForCausalLM`, `PersimmonForCausalLM`, `PhiForCausalLM`, `Phi3ForCausalLM`, `Phi3SmallForCausalLM`, `PhiMoEForCausalLM`, `Qwen2ForCausalLM`, `Qwen2MoeForCausalLM`, `RWForCausalLM`, `StableLMEpochForCausalLM`, `StableLmForCausalLM`, `Starcoder2ForCausalLM`, `SolarForCausalLM`, `TeleChat2ForCausalLM`, `XverseForCausalLM`, `BartModel`, `BartForConditionalGeneration`, `Florence2ForConditionalGeneration`, `BertModel`, `RobertaModel`, `RobertaForMaskedLM`, `XLMRobertaModel`, `DeciLMForCausalLM`, `Gemma2Model`, `GlmForCausalLM`, `GritLM`, `InternLM2ForRewardModel`, `JambaForSequenceClassification`, `LlamaModel`, `MistralModel`, `Phi3ForCausalLM`, `Qwen2Model`, `Qwen2ForCausalLM`, `Qwen2ForRewardModel`, `Qwen2ForProcessRewardModel`, `TeleChat2ForCausalLM`, `LlavaNextForConditionalGeneration`, `Phi3VForCausalLM`, `Qwen2VLForConditionalGeneration`, `Qwen2ForSequenceClassification`, `BertForSequenceClassification`, `RobertaForSequenceClassification`, `XLMRobertaForSequenceClassification`, `AriaForConditionalGeneration`, `Blip2ForConditionalGeneration`, `ChameleonForConditionalGeneration`, `ChatGLMModel`, `ChatGLMForConditionalGeneration`, `DeepseekVLV2ForCausalLM`, `FuyuForCausalLM`, `H2OVLChatModel`, `InternVLChatModel`, `Idefics3ForConditionalGeneration`, `LlavaForConditionalGeneration`, `LlavaNextForConditionalGeneration`, `LlavaNextVideoForConditionalGeneration`, `LlavaOnevisionForConditionalGeneration`, `MantisForConditionalGeneration`, `MiniCPMO`, `MiniCPMV`, `MolmoForCausalLM`, `NVLM_D`, `PaliGemmaForConditionalGeneration`, `Phi3VForCausalLM`, `PixtralForConditionalGeneration`, `QWenLMHeadModel`, `Qwen2VLForConditionalGeneration`, `Qwen2_5_VLForConditionalGeneration`, `Qwen2AudioForConditionalGeneration`, `UltravoxModel`, `MllamaForConditionalGeneration`, `WhisperForConditionalGeneration`, `EAGLEModel`, `MedusaModel`, `MLPSpeculatorPreTrainedModel`
4975

50-
- When deploying custom models, you remain responsible for complying with any License requirements from the model provider.
51-
- We currently
5276

0 commit comments

Comments
 (0)