Skip to content

Commit 155a05f

Browse files
bene2k1jcirinosclwyfpagny
authored
docs(infr): add model catalog page (#4854)
* feat(infr): add catalog page * docs(infr): add model catalog page * docs(infr): update * docs(infr): add table * docs(infr): test * feat(infr): add model catalog * Apply suggestions from code review Co-authored-by: Jessica <[email protected]> * fix(infr): fix typos * fix(inference): supported languages * fix(inference): update licenses * fix(inference): models supported features * fix(inference): update context length and tasks * fix(inference): update task descriptions * feat(inference): add gemma and mistral small characteristics * feat(inference): restructure model catalog * fix(inference): fix anchors --------- Co-authored-by: Jessica <[email protected]> Co-authored-by: fpagny <[email protected]>
1 parent 30889d2 commit 155a05f

File tree

2 files changed

+311
-0
lines changed

2 files changed

+311
-0
lines changed

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -880,6 +880,10 @@
880880
"label": "Support for function calling in Scaleway Managed Inference",
881881
"slug": "function-calling-support"
882882
},
883+
{
884+
"label": "Managed Inference model catalog",
885+
"slug": "model-catalog"
886+
},
883887
{
884888
"label": "BGE-Multilingual-Gemma2 model",
885889
"slug": "bge-multilingual-gemma2"
Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
---
2+
meta:
3+
title: Managed Inference model catalog
4+
description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Managed Inference model catalog
7+
paragraph: This page provides information on the Scaleway Managed Inference product catalog
8+
tags:
9+
dates:
10+
validation: 2025-04-18
11+
posted: 2024-04-18
12+
categories:
13+
- ai-data
14+
---
15+
A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
16+
17+
## Models technical summary
18+
19+
| Model name | Provider | Maximum Context length (tokens) | Modalities | Instances | License |
20+
|------------|----------|--------------|------------|-----------|---------|
21+
| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
22+
| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.3 community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
23+
| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
24+
| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
25+
| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
26+
| [`llama-3.1-nemotron-70b-instruct`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
27+
| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
28+
| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
29+
| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
30+
| [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
31+
| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
32+
| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
33+
| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
34+
| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4k | Audio to Audio | L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
35+
| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
36+
| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
37+
| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
38+
| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)|
39+
| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
40+
| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
41+
| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
42+
43+
## Models feature summary
44+
| Model name | Structured output supported | Function calling | Supported languages |
45+
| --- | --- | --- | --- |
46+
| `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages |
47+
| `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
48+
| `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
49+
| `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
50+
| `llama-3-70b-instruct` | Yes | Yes | English |
51+
| `llama-3.1-nemotron-70b-instruct` | Yes | Yes | English |
52+
| `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese |
53+
| `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese |
54+
| `mistral-7b-instruct-v0.3` | Yes | Yes | English |
55+
| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
56+
| `mistral-small-24b-instruct-2501` | Yes | Yes | Text | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
57+
| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
58+
| `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish |
59+
| `moshiko-0.1-8b` | No | No | English |
60+
| `moshika-0.1-8b` | No | No | English |
61+
| `wizardLM-70b-v1.0` | Yes | No | English |
62+
| `pixtral-12b-2409` | Yes | Yes | English |
63+
| `molmo-72b-0924` | Yes | No | English |
64+
| `qwen2.5-coder-32b-instruct` | Yes | Yes | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic and 16 additional languages. |
65+
| `bge-multilingual-gemma2` | No | No | English, French, Chinese, Japanese, Korean |
66+
| `sentence-t5-xxl` | No | No | English |
67+
68+
69+
## Model details
70+
<Message type="note">
71+
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
72+
</Message>
73+
74+
## Multimodal models (Text and Vision)
75+
76+
<Message type="note">
77+
Vision models can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
78+
</Message>
79+
80+
### Gemma-3-27b-it
81+
Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages.
82+
The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.
83+
84+
#### Model names
85+
```
86+
google/gemma-3-27b-it:bf16
87+
```
88+
89+
### Mistral-small-3.1-24b-instruct-2503
90+
Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages.
91+
This model was optimized to have a dense knowledge and faster tokens throughput compared to its size.
92+
93+
#### Model names
94+
```
95+
mistral/mistral-small-3.1-24b-instruct-2503:bf16
96+
```
97+
98+
### Pixtral-12b-2409
99+
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
100+
It can analyze images and offer insights from visual content alongside text.
101+
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
102+
Pixtral is open-weight and distributed under the Apache 2.0 license.
103+
104+
#### Model name
105+
```
106+
mistral/pixtral-12b-2409:bf16
107+
```
108+
109+
### Molmo-72b-0924
110+
Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
111+
Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
112+
113+
#### Model name
114+
```
115+
allenai/molmo-72b-0924:fp8
116+
```
117+
118+
## Text models
119+
120+
### Llama-3.3-70b-instruct
121+
Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
122+
This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
123+
124+
#### Model name
125+
```
126+
meta/llama-3.3-70b-instruct:fp8
127+
meta/llama-3.3-70b-instruct:bf16
128+
```
129+
130+
### Llama-3.1-70b-instruct
131+
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
132+
Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
133+
134+
#### Model names
135+
```
136+
meta/llama-3.1-70b-instruct:fp8
137+
meta/llama-3.1-70b-instruct:bf16
138+
```
139+
140+
### Llama-3.1-8b-instruct
141+
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
142+
Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
143+
144+
#### Model names
145+
```
146+
meta/llama-3.1-8b-instruct:fp8
147+
meta/llama-3.1-8b-instruct:bf16
148+
```
149+
150+
### Llama-3-70b-instruct
151+
Meta’s Llama 3 is an iteration of the open-access Llama family.
152+
Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
153+
With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
154+
155+
#### Model name
156+
```
157+
meta/llama-3-70b-instruct:fp8
158+
```
159+
160+
### Llama-3.1-Nemotron-70b-instruct
161+
Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions.
162+
NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
163+
164+
#### Model name
165+
```
166+
nvidia/llama-3.1-nemotron-70b-instruct:fp8
167+
```
168+
169+
### DeepSeek-R1-Distill-Llama-70B
170+
Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
171+
DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
172+
173+
#### Model name
174+
```
175+
deepseek/deepseek-r1-distill-llama-70b:fp8
176+
deepseek/deepseek-r1-distill-llama-70b:bf16
177+
```
178+
179+
### DeepSeek-R1-Distill-Llama-8B
180+
Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
181+
DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
182+
183+
#### Model names
184+
```
185+
deepseek/deepseek-r1-distill-llama-8b:fp8
186+
deepseek/deepseek-r1-distill-llama-8b:bf16
187+
```
188+
189+
### Mixtral-8x7b-instruct-v0.1
190+
Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
191+
Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
192+
193+
#### Model names
194+
```
195+
mistral/mixtral-8x7b-instruct-v0.1:fp8
196+
mistral/mixtral-8x7b-instruct-v0.1:bf16
197+
```
198+
199+
### Mistral-7b-instruct-v0.3
200+
The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
201+
This model is open-weight and distributed under the Apache 2.0 license.
202+
203+
#### Model name
204+
```
205+
mistral/mistral-7b-instruct-v0.3:bf16
206+
```
207+
208+
### Mistral-small-24b-instruct-2501
209+
Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
210+
This model is open-weight and distributed under the Apache 2.0 license.
211+
212+
#### Model name
213+
```
214+
mistral/mistral-small-24b-instruct-2501:fp8
215+
mistral/mistral-small-24b-instruct-2501:bf16
216+
```
217+
218+
### Mistral-nemo-instruct-2407
219+
Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
220+
This model is open-weight and distributed under the Apache 2.0 license.
221+
It was trained on a large proportion of multilingual and code data.
222+
223+
#### Model name
224+
```
225+
mistral/mistral-nemo-instruct-2407:fp8
226+
```
227+
228+
### Moshiko-0.1-8b
229+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
230+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
231+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
232+
Moshiko is the variant of Moshi with a male voice in English.
233+
234+
#### Model names
235+
```
236+
kyutai/moshiko-0.1-8b:bf16
237+
kyutai/moshiko-0.1-8b:fp8
238+
```
239+
240+
### Moshika-0.1-8b
241+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
242+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
243+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
244+
Moshika is the variant of Moshi with a female voice in English.
245+
246+
#### Model names
247+
```
248+
kyutai/moshika-0.1-8b:bf16
249+
kyutai/moshika-0.1-8b:fp8
250+
```
251+
252+
### WizardLM-70B-V1.0
253+
WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
254+
With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
255+
256+
#### Model names
257+
```
258+
wizardlm/wizardlm-70b-v1.0:fp8
259+
wizardlm/wizardlm-70b-v1.0:fp16
260+
```
261+
262+
## Code models
263+
264+
### Qwen2.5-coder-32b-instruct
265+
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
266+
With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
267+
268+
#### Model name
269+
```
270+
qwen/qwen2.5-coder-32b-instruct:int8
271+
```
272+
273+
## Embeddings models
274+
275+
### Bge-multilingual-gemma2
276+
BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard), scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024).
277+
As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.
278+
279+
| Attribute | Value |
280+
|-----------|-------|
281+
| Embedding dimensions | 3584 |
282+
| Matryoshka embedding | No |
283+
284+
<Message type="note">
285+
[Matryoshka embeddings](https://huggingface.co/blog/matryoshka) refers to embeddings trained on multiple dimensions number. As a result, resulting vectors dimensions will be sorted by most meaningful first. For example, a 3584 dimensions vector can be truncated to its 768 first dimensions and used directly.
286+
</Message>
287+
288+
#### Model name
289+
```
290+
baai/bge-multilingual-gemma2:fp32
291+
```
292+
293+
### Sentence-t5-xxl
294+
The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
295+
Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
296+
This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
297+
298+
299+
| Attribute | Value |
300+
|-----------|-------|
301+
| Embedding dimensions | 768 |
302+
| Matryoshka embedding | No |
303+
304+
#### Model name
305+
```
306+
sentence-transformers/sentence-t5-xxl:fp32
307+
```

0 commit comments

Comments
 (0)