Skip to content

Commit 236bc41

Browse files
committed
docs(infr): test
1 parent 9a6419f commit 236bc41

File tree

1 file changed

+366
-0
lines changed

1 file changed

+366
-0
lines changed
Lines changed: 366 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,366 @@
1+
---
2+
meta:
3+
title: Managed Inference model catalog
4+
description: Deploy your own model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Managed Inference model catalog
7+
paragraph: This page provides information on the Scaleway Managed Inference product catalog
8+
tags:
9+
dates:
10+
validation: 2025-03-19
11+
posted: 2024-05-28
12+
categories:
13+
- ai-data
14+
---
15+
A quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples, curl commands, and detailed capabilities.
16+
17+
## Summary table
18+
| Model Name | Provider | Context Length | Modalities | Instances | License |
19+
|------------|----------|--------------|------------|-----------|---------|
20+
| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k tokens | Text | H100 | Apache 2.0 |
21+
| [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 32k tokens | Text | H100, H100-2 | Llama 3 community |
22+
| [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | Meta | up to 128k tokens | Text | L4, L40S, H100, H100-2 | Llama 3 community |
23+
| [`llama-3-70b-instruct`](#llama-3-70b-instruct) | Meta | 8k tokens | Text | H100 | Llama 3 community |
24+
| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | up to 131k tokens | Text | H100, H100-2 | Llama 3 community |
25+
| [`llama-3-nemotron-70b`](#llama-31-nemotron-70b-instruct) | Nvidia | up to 128k tokens | Text | H100, H100-2 | Lllama 3.3 community |
26+
| [`deepseek-r1-distill-70b`](#deepseek-r1-distill-llama-70b) | Deepseek | up to 131k tokens | Text | H100, H100-2 | MIT |
27+
| [`deepseek-r1-distill-8b`](#deepseek-r1-distill-llama-8b) | Deepseek | up to 131k tokens | Text | L4, L40S, H100 | Apache 2.0 |
28+
| [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | Mistral | 32k tokens | Text | L4, L40S, H100, H100-1 | Apache 2.0 |
29+
| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-base-2501) | Mistral | 32k tokens | Text | L40S, H100, H100-2 | Apache 2.0 |
30+
| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | Apache 2.0 |
31+
| [`moshiko-0.1-8b`](#moshiko-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
32+
| [`moshika-0.1-8b`](#moshika-01-8b) | Kyutai | 4,096 tokens | Text | L4, H100 | Apache 2.0 |
33+
| [`wizardlm-70b-v1.0`](#wizardlm-70b-v10) | WizardLM | 4,096 tokens | Text | H100, H100-2 | Lllama 2 community |
34+
| [`pixtral-12b-2409`](#pixtral-12b-2409) | Mistral | 128k tokens | Multimodal | L40S, H100, H100-2 | Apache 2.0 |
35+
| [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Multimodal | H100-2 | Apache 2.0 |
36+
| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | up to 32k | Code | H100, H100-2 | Qianwen License |
37+
| [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 tokens | Embeddings | L4 | Apache 2.0 |
38+
39+
| Model Name | Structured output supported | Function calling | Supported languages |
40+
| --- | --- | --- | --- |
41+
| `Mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Spanish |
42+
| `Llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
43+
| `Llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
44+
| `Llama-3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
45+
| `Llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
46+
| `Llama-3.1-Nemotron-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai (to verify) |
47+
| `DeepSeek-R1-Distill-Llama-70B` | Yes | Yes | English, Simplified Chinese |
48+
| `DeepSeek-R1-Distill-Llama-8B` | Yes | Yes | English, Simplified Chinese |
49+
| `Mistral-7b-instruct-v0.3` | Yes | Yes | English |
50+
| `Mistral-small-24b-base-2501` | Yes | Yes | English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
51+
| `Mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
52+
| `Moshiko-0.1-8b` | No | No | English |
53+
| `Moshika-0.1-8b` | No | No | English |
54+
| `WizardLM-70B-v1.0` | Yes | No | English (to be verified) |
55+
| `Pixtral-12b-2409` | Yes | No | English, French, German, Spanish (to be verified) |
56+
| `Molmo-72b-0924` | Yes | No | English, French, German, Spanish (to be verified) |
57+
| `Qwen2.5-coder-32b-instruct` | Yes | Yes | Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
58+
| `Sentence-t5-xxl` | No | No | English (to be verified) |
59+
60+
## Model details
61+
<Message type="note">
62+
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
63+
</Message>
64+
65+
## Text models
66+
67+
### Mixtral-8x7b-instruct-v0.1
68+
Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants.
69+
Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
70+
71+
| Attribute | Value |
72+
|-----------|-------|
73+
| Structured output supported | Yes |
74+
| Function calling | No |
75+
| Supported languages | English, French, German, Spanish |
76+
77+
#### Model names
78+
```
79+
mistral/mixtral-8x7b-instruct-v0.1:fp8
80+
mistral/mixtral-8x7b-instruct-v0.1:bf16
81+
```
82+
83+
### Llama-3.1-70b-instruct
84+
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
85+
Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
86+
87+
| Attribute | Value |
88+
|-----------|-------|
89+
| Structured output supported | Yes |
90+
| Function calling | Yes |
91+
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
92+
93+
#### Model names
94+
```
95+
meta/llama-3.1-70b-instruct:fp8
96+
meta/llama-3.1-70b-instruct:bf16
97+
```
98+
99+
### Llama-3.1-8b-instruct
100+
Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
101+
Llama 3.1 was designed to match the best proprietary models, outperform many of the available open source on common industry benchmarks.
102+
103+
| Attribute | Value |
104+
|-----------|-------|
105+
| Structured output supported | Yes |
106+
| Function calling | Yes |
107+
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
108+
109+
#### Model names
110+
```
111+
meta/llama-3.1-8b-instruct:fp8
112+
meta/llama-3.1-8b-instruct:bf16
113+
```
114+
115+
### Llama-3-70b-instruct
116+
Meta’s Llama 3 is an iteration of the open-access Llama family.
117+
Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs.
118+
With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
119+
120+
| Attribute | Value |
121+
|-----------|-------|
122+
| Structured output supported | Yes |
123+
| Function calling | Yes |
124+
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
125+
126+
#### Model name
127+
```
128+
meta/llama-3-70b-instruct:fp8
129+
```
130+
131+
### Llama-3.3-70b-instruct
132+
Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/llama-3.1-70b-instruct/) model.
133+
This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
134+
135+
| Attribute | Value |
136+
|-----------|-------|
137+
| Structured output supported | Yes |
138+
| Function calling | Yes |
139+
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
140+
141+
#### Model name
142+
```
143+
meta/llama-3.3-70b-instruct:bf16
144+
```
145+
146+
### Llama-3.1-Nemotron-70b-instruct
147+
Introduced October 14, 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions.
148+
NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
149+
150+
| Attribute | Value |
151+
|-----------|-------|
152+
| Structured output supported | Yes |
153+
| Function calling | Yes |
154+
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. (to verify) |
155+
156+
#### Model name
157+
```
158+
meta/llama-3.1-nemotron-70b-instruct:fp8
159+
```
160+
161+
### DeepSeek-R1-Distill-Llama-70B
162+
Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
163+
DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks.
164+
165+
| Attribute | Value |
166+
|-----------|-------|
167+
| Structured output supported | Yes |
168+
| Function calling | Yes |
169+
| Supported languages | English, Simplified Chinese |
170+
171+
#### Model name
172+
```
173+
deepseek/deepseek-r1-distill-llama-70b:bf16
174+
```
175+
176+
### DeepSeek-R1-Distill-Llama-8B
177+
Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1.
178+
DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
179+
180+
| Attribute | Value |
181+
|-----------|-------|
182+
| Structured output supported | Yes |
183+
| Function calling | Yes |
184+
| Supported languages | English, Simplified Chinese |
185+
186+
#### Model names
187+
```
188+
deepseek/deepseek-r1-distill-llama-8b:bf16
189+
```
190+
191+
### Mistral-7b-instruct-v0.3
192+
The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters.
193+
This model is open-weight and distributed under the Apache 2.0 license.
194+
195+
| Attribute | Value |
196+
|-----------|-------|
197+
| Structured output supported | Yes |
198+
| Function calling | Yes |
199+
| Supported languages | English |
200+
201+
#### Model name
202+
```
203+
mistral/mistral-7b-instruct-v0.3:bf16
204+
```
205+
206+
### Mistral-small-24b-base-2501
207+
Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
208+
This model is open-weight and distributed under the Apache 2.0 license.
209+
210+
| Attribute | Value |
211+
|-----------|-------|
212+
| Structured output supported | Yes |
213+
| Function calling | Yes |
214+
| Supported languages | Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish |
215+
216+
#### Model name
217+
```
218+
mistral/mistral-small-24b-instruct-2501:fp8
219+
```
220+
221+
### Mistral-nemo-instruct-2407
222+
Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA.
223+
This model is open-weight and distributed under the Apache 2.0 license.
224+
It was trained on a large proportion of multilingual and code data.
225+
226+
| Attribute | Value |
227+
|-----------|-------|
228+
| Structured output supported | Yes |
229+
| Function calling | Yes |
230+
| Supported languages | English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi |
231+
232+
#### Model name
233+
```
234+
mistral/mistral-nemo-instruct-2407:fp8
235+
```
236+
237+
### Moshiko-0.1-8b
238+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
239+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
240+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
241+
Moshiko is the variant of Moshi with a male voice in English.
242+
243+
| Attribute | Value |
244+
|-----------|-------|
245+
| Structured output supported | No |
246+
| Function calling | No |
247+
| Supported languages | English |
248+
249+
#### Model names
250+
```
251+
kyutai/moshiko-0.1-8b:bf16
252+
kyutai/moshiko-0.1-8b:fp8
253+
```
254+
255+
### Moshika-0.1-8b
256+
Kyutai's Moshi is a speech-text foundation model for real-time dialogue.
257+
Moshi is an experimental next-generation conversational model, designed to understand and respond fluidly and naturally to complex conversations, while providing unprecedented expressiveness and spontaneity.
258+
While current systems for spoken dialogue rely on a pipeline of separate components, Moshi is the first real-time full-duplex spoken large language model.
259+
Moshika is the variant of Moshi with a female voice in English.
260+
261+
| Attribute | Value |
262+
|-----------|-------|
263+
| Structured output supported | No |
264+
| Function calling | No |
265+
| Supported languages | English |
266+
267+
#### Model names
268+
```
269+
kyutai/moshika-0.1-8b:bf16
270+
kyutai/moshika-0.1-8b:fp8
271+
```
272+
273+
### WizardLM-70B-V1.0
274+
WizardLM-70B-V1.0, developed by WizardLM, is specifically designed for content creation platforms and writing assistants.
275+
With its extensive training in diverse textual data, WizardLM-70B-V1.0 generates high-quality content and assists writers in various creative and professional endeavors.
276+
277+
| Attribute | Value |
278+
|-----------|-------|
279+
| Structured output supported | Yes |
280+
| Function calling | No |
281+
| Supported languages | English (to be verified) |
282+
283+
#### Model names
284+
```
285+
wizardlm/wizardlm-70b-v1.0:fp8
286+
wizardlm/wizardlm-70b-v1.0:fp16
287+
```
288+
289+
## Multimodal models
290+
291+
### Pixtral-12b-2409
292+
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
293+
It can analyze images and offer insights from visual content alongside text.
294+
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
295+
Pixtral is open-weight and distributed under the Apache 2.0 license.
296+
297+
| Attribute | Value |
298+
|-----------|-------|
299+
| Structured output supported | Yes |
300+
| Function calling | No |
301+
| Supported languages | English, French, German, Spanish (to be verified) |
302+
303+
<Message type="note">
304+
Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
305+
</Message>
306+
307+
#### Model name
308+
```
309+
mistral/pixtral-12b-2409:bf16
310+
```
311+
312+
### Molmo-72b-0924
313+
Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
314+
Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
315+
Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source.
316+
Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).
317+
318+
| Attribute | Value |
319+
|-----------|-------|
320+
| Structured output supported | Yes |
321+
| Function calling | No |
322+
| Supported languages | English, French, German, Spanish (to be verified) |
323+
324+
<Message type="note">
325+
Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
326+
</Message>
327+
328+
#### Model name
329+
```
330+
allenai/molmo-72b-0924:fp8
331+
```
332+
333+
## Code models
334+
335+
### Qwen2.5-coder-32b-instruct
336+
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
337+
With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
338+
339+
| Attribute | Value |
340+
|-----------|-------|
341+
| Structured output supported | Yes |
342+
| Function calling | Yes |
343+
| Supported languages | over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic |
344+
345+
#### Model name
346+
```
347+
qwen/qwen2.5-coder-32b-instruct:int8
348+
```
349+
350+
## Embeddings models
351+
352+
### Sentence-t5-xxl
353+
The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture.
354+
Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information.
355+
This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
356+
357+
| Attribute | Value |
358+
|-----------|-------|
359+
| Structured output supported | No |
360+
| Function calling | No |
361+
| Supported languages | English (to be verified) |
362+
363+
#### Model name
364+
```
365+
sentence-transformers/sentence-t5-xxl:fp32
366+
```

0 commit comments

Comments
 (0)