Skip to content

Commit 9d9b019

Browse files
authored
feat(inference): restructure model catalog
1 parent bd78dfa commit 9d9b019

File tree

1 file changed

+63
-145
lines changed

1 file changed

+63
-145
lines changed

pages/managed-inference/reference-content/model-catalog.mdx

Lines changed: 63 additions & 145 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
2323
| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
2424
| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | 128k | Text | L4, L40S, H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
2525
| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k | Text | H100, H100-2 | [Llama 3 community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
26-
| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
26+
| [`llama-3.1-nemotron-70b-instruct`](#llama-31-nemotron-70b-instruct) | Nvidia | 128k | Text | H100, H100-2 | [Llama 3.1 community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
2727
| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | 128k | Text | H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) |
2828
| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | 128k | Text | L4, L40S, H100, H100-2 | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) |
2929
| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k | Text | L4, L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -34,7 +34,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
3434
| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4k | Audio to Audio| L4, H100 | [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md) |
3535
| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4k | Text | H100, H100-2 | [Llama 2 community](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/LICENSE.txt) |
3636
| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k | Text, Vision | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
37-
| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
37+
| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)|
3838
| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3939
| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
4040
| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -71,6 +71,10 @@ A quick overview of available models in Scaleway's catalog and their core attrib
7171

7272
## Multimodal models (Text and Vision)
7373

74+
<Message type="note">
75+
Vision models can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
76+
</Message>
77+
7478
### Gemma-3-27b-it
7579
Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages.
7680
The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.
@@ -89,28 +93,42 @@ This model was optimized to have a dense knowledge and faster tokens throughput
8993
mistral/mistral-small-3.1-24b-instruct-2503:bf16
9094
```
9195

96+
### Pixtral-12b-2409
97+
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
98+
It can analyze images and offer insights from visual content alongside text.
99+
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
100+
Pixtral is open-weight and distributed under the Apache 2.0 license.
101+
102+
#### Model name
103+
```
104+
mistral/pixtral-12b-2409:bf16
105+
```
106+
107+
### Molmo-72b-0924
108+
Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
109+
Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
110+
111+
#### Model name
112+
```
113+
allenai/molmo-72b-0924:fp8
114+
```
115+
92116
## Text models
93117

94-
### Mixtral-8x7b-instruct-v0.1
95-
Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
96-
Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
118+
### Llama-3.3-70b-instruct
119+
Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
120+
This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
97121

98-
#### Model names
122+
#### Model name
99123
```
100-
mistral/mixtral-8x7b-instruct-v0.1:fp8
101-
mistral/mixtral-8x7b-instruct-v0.1:bf16
124+
meta/llama-3.3-70b-instruct:fp8
125+
meta/llama-3.3-70b-instruct:bf16
102126
```
103127

104128
### Llama-3.1-70b-instruct
105129
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
106130
Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
107131

108-
| Attribute | Value |
109-
|-----------|-------|
110-
| Structured output supported | Yes |
111-
| Function calling | Yes |
112-
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
113-
114132
#### Model names
115133
```
116134
meta/llama-3.1-70b-instruct:fp8
@@ -121,12 +139,6 @@ meta/llama-3.1-70b-instruct:bf16
121139
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
122140
Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
123141

124-
| Attribute | Value |
125-
|-----------|-------|
126-
| Structured output supported | Yes |
127-
| Function calling | Yes |
128-
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
129-
130142
#### Model names
131143
```
132144
meta/llama-3.1-8b-instruct:fp8
@@ -138,87 +150,54 @@ Meta’s Llama 3 is an iteration of the open-access Llama family.
138150
Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
139151
With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
140152

141-
| Attribute | Value |
142-
|-----------|-------|
143-
| Structured output supported | Yes |
144-
| Function calling | Yes |
145-
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
146-
147153
#### Model name
148154
```
149155
meta/llama-3-70b-instruct:fp8
150156
```
151157

152-
### Llama-3.3-70b-instruct
153-
Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
154-
This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
155-
156-
| Attribute | Value |
157-
|-----------|-------|
158-
| Structured output supported | Yes |
159-
| Function calling | Yes |
160-
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
161-
162-
#### Model name
163-
```
164-
meta/llama-3.3-70b-instruct:bf16
165-
```
166-
167158
### Llama-3.1-Nemotron-70b-instruct
168159
Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions.
169160
NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
170161

171-
| Attribute | Value |
172-
|-----------|-------|
173-
| Structured output supported | Yes |
174-
| Function calling | Yes |
175-
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
176-
177162
#### Model name
178163
```
179-
meta/llama-3.1-nemotron-70b-instruct:fp8
164+
nvidia/llama-3.1-nemotron-70b-instruct:fp8
180165
```
181166

182167
### DeepSeek-R1-Distill-Llama-70B
183168
Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
184169
DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
185170

186-
| Attribute | Value |
187-
|-----------|-------|
188-
| Structured output supported | Yes |
189-
| Function calling | Yes |
190-
| Supported languages | English, Simplified Chinese |
191-
192171
#### Model name
193172
```
173+
deepseek/deepseek-r1-distill-llama-70b:fp8
194174
deepseek/deepseek-r1-distill-llama-70b:bf16
195175
```
196176

197177
### DeepSeek-R1-Distill-Llama-8B
198178
Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
199179
DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
200180

201-
| Attribute | Value |
202-
|-----------|-------|
203-
| Structured output supported | Yes |
204-
| Function calling | Yes |
205-
| Supported languages | English, Simplified Chinese |
206-
207181
#### Model names
208182
```
183+
deepseek/deepseek-r1-distill-llama-8b:fp8
209184
deepseek/deepseek-r1-distill-llama-8b:bf16
210185
```
211186

187+
### Mixtral-8x7b-instruct-v0.1
188+
Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
189+
Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
190+
191+
#### Model names
192+
```
193+
mistral/mixtral-8x7b-instruct-v0.1:fp8
194+
mistral/mixtral-8x7b-instruct-v0.1:bf16
195+
```
196+
212197
### Mistral-7b-instruct-v0.3
213198
The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
214199
This model is open-weight and distributed under the Apache 2.0 license.
215200

216-
| Attribute | Value |
217-
|-----------|-------|
218-
| Structured output supported | Yes |
219-
| Function calling | Yes |
220-
| Supported languages | English |
221-
222201
#### Model name
223202
```
224203
mistral/mistral-7b-instruct-v0.3:bf16
@@ -228,28 +207,17 @@ mistral/mistral-7b-instruct-v0.3:bf16
228207
Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
229208
This model is open-weight and distributed under the Apache 2.0 license.
230209

231-
| Attribute | Value |
232-
|-----------|-------|
233-
| Structured output supported | Yes |
234-
| Function calling | Yes |
235-
| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
236-
237210
#### Model name
238211
```
239212
mistral/mistral-small-24b-instruct-2501:fp8
213+
mistral/mistral-small-24b-instruct-2501:bf16
240214
```
241215

242216
### Mistral-nemo-instruct-2407
243217
Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
244218
This model is open-weight and distributed under the Apache 2.0 license.
245219
It was trained on a large proportion of multilingual and code data.
246220

247-
| Attribute | Value |
248-
|-----------|-------|
249-
| Structured output supported | Yes |
250-
| Function calling | Yes |
251-
| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
252-
253221
#### Model name
254222
```
255223
mistral/mistral-nemo-instruct-2407:fp8
@@ -261,12 +229,6 @@ Moshi is an experimental next-generation conversational model, designed to under
261229
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
262230
Moshiko is the variant of Moshi with a male voice in English.
263231

264-
| Attribute | Value |
265-
|-----------|-------|
266-
| Structured output supported | No |
267-
| Function calling | No |
268-
| Supported languages | English |
269-
270232
#### Model names
271233
```
272234
kyutai/moshiko-0.1-8b:bf16
@@ -279,12 +241,6 @@ Moshi is an experimental next-generation conversational model, designed to under
279241
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
280242
Moshika is the variant of Moshi with a female voice in English.
281243

282-
| Attribute | Value |
283-
|-----------|-------|
284-
| Structured output supported | No |
285-
| Function calling | No |
286-
| Supported languages | English |
287-
288244
#### Model names
289245
```
290246
kyutai/moshika-0.1-8b:bf16
@@ -295,91 +251,53 @@ kyutai/moshika-0.1-8b:fp8
295251
WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
296252
With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
297253

298-
| Attribute | Value |
299-
|-----------|-------|
300-
| Structured output supported | Yes |
301-
| Function calling | No |
302-
| Supported languages | English (to be verified) |
303-
304254
#### Model names
305255
```
306256
wizardlm/wizardlm-70b-v1.0:fp8
307257
wizardlm/wizardlm-70b-v1.0:fp16
308258
```
309259

310-
## Multimodal models
311-
312-
### Pixtral-12b-2409
313-
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
314-
It can analyze images and offer insights from visual content alongside text.
315-
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
316-
Pixtral is open-weight and distributed under the Apache 2.0 license.
317-
318-
| Attribute | Value |
319-
|-----------|-------|
320-
| Structured output supported | Yes |
321-
| Function calling | No |
322-
| Supported languages | English, French, German, Spanish (to be verified) |
260+
## Code models
323261

324-
<Message type="note">
325-
Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
326-
</Message>
262+
### Qwen2.5-coder-32b-instruct
263+
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
264+
With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
327265

328266
#### Model name
329267
```
330-
mistral/pixtral-12b-2409:bf16
268+
qwen/qwen2.5-coder-32b-instruct:int8
331269
```
332270

333-
### Molmo-72b-0924
334-
Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
335-
Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
336-
Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
337-
Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
271+
## Embeddings models
272+
273+
### Bge-multilingual-gemma2
274+
BGE-Multilingual-Gemma2 tops the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard), scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024).
275+
As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.
338276

339277
| Attribute | Value |
340278
|-----------|-------|
341-
| Structured output supported | Yes |
342-
| Function calling | No |
343-
| Supported languages | English, French, German, Spanish (to be verified) |
279+
| Embedding dimensions | 3584 |
280+
| Matryoshka embedding | No |
344281

345282
<Message type="note">
346-
Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
283+
[Matryoshka embeddings](https://huggingface.co/blog/matryoshka) refers to embeddings trained on multiple dimensions number. As a result, resulting vectors dimensions will be sorted by most meaningful first. For example, a 3584 dimensions vector can be truncated to its 768 first dimensions and used directly.
347284
</Message>
348285

349286
#### Model name
350287
```
351-
allenai/molmo-72b-0924:fp8
288+
baai/bge-multilingual-gemma2:fp32
352289
```
353290

354-
## Code models
355-
356-
### Qwen2.5-coder-32b-instruct
357-
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
358-
With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
359-
360-
| Attribute | Value |
361-
|-----------|-------|
362-
| Structured output supported | Yes |
363-
| Function calling | Yes |
364-
| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
365-
366-
#### Model name
367-
```
368-
qwen/qwen2.5-coder-32b-instruct:int8
369-
```
370-
371-
## Embeddings models
372-
373291
### Sentence-t5-xxl
374292
The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
375293
Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
376294
This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
377295

296+
378297
| Attribute | Value |
379298
|-----------|-------|
380-
| Structured output supported | No |
381-
| Function calling | No |
382-
| Supported languages | English (to be verified) |
299+
| Embedding dimensions | 768 |
300+
| Matryoshka embedding | No |
383301

384302
#### Model name
385303
```

0 commit comments

Comments
 (0)