Skip to content

Commit df0a872

Browse files
fpagnybene2k1RoRoJ
authored
fix(genapi): update deepseek-r1-distill-llama-70b.mdx (#4545)
* Update deepseek-r1-distill-llama-70b.mdx Update supported models in Managed Inference. * Update deepseek-r1-distill-llama-8b.mdx * Update llama-3.3-70b-instruct.mdx * Create mistral-small-24b-instruct-2501.mdx * Update pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx Co-authored-by: Rowena Jones <[email protected]> --------- Co-authored-by: Benedikt Rollik <[email protected]> Co-authored-by: Rowena Jones <[email protected]>
1 parent 71c9667 commit df0a872

File tree

4 files changed

+90
-11
lines changed

4 files changed

+90
-11
lines changed

pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ categories:
1919
|-----------------|------------------------------------|
2020
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) |
2121
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
22-
| Compatible Instances | H100-2 (BF16) |
23-
| Context Length | up to 56k tokens |
22+
| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
23+
| Context Length | up to 131k tokens |
2424

2525
## Model names
2626

@@ -32,7 +32,8 @@ deepseek/deepseek-r1-distill-llama-70b:bf16
3232

3333
| Instance type | Max context length |
3434
| ------------- |-------------|
35-
| H100-2 | 56k (BF16) |
35+
| H100 | 15k (FP8) |
36+
| H100-2 | 131k (FP8), 56k (BF16) |
3637

3738
## Model introduction
3839

pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ categories:
1919
|-----------------|------------------------------------|
2020
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
2121
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
22-
| Compatible Instances | L4, L40S, H100 (BF16) |
22+
| Compatible Instances | L4, L40S, H100 (FP8, BF16) |
2323
| Context Length | up to 131k tokens |
2424

2525
## Model names
@@ -32,9 +32,9 @@ deepseek/deepseek-r1-distill-llama-8b:bf16
3232

3333
| Instance type | Max context length |
3434
| ------------- |-------------|
35-
| L4 | 39k (BF16) |
36-
| L40S | 131k (BF16) |
37-
| H100 | 131k (BF16) |
35+
| L4 | 90k (FP8), 39k (BF16) |
36+
| L40S | 131k (FP8, BF16) |
37+
| H100 | 131k (FP8, BF16) |
3838

3939
## Model introduction
4040

pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ categories:
1919
|-----------------|------------------------------------|
2020
| Provider | [Meta](https://www.llama.com/) |
2121
| License | [Llama 3.3 community](https://www.llama.com/llama3_3/license/) |
22-
| Compatible Instances | H100-2 (BF16) |
23-
| Context length | Up to 70k tokens |
22+
| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
23+
| Context length | Up to 131k tokens |
2424

2525
## Model names
2626

@@ -32,7 +32,8 @@ meta/llama-3.3-70b-instruct:bf16
3232

3333
| Instance type | Max context length |
3434
| ------------- |-------------|
35-
| H100-2 | 62k (BF16) |
35+
| H100 | 15k (FP8) |
36+
| H100-2 | 131k (FP8), 62k (BF16) |
3637

3738
## Model introduction
3839

@@ -76,4 +77,4 @@ Process the output data according to your application's needs. The response will
7677

7778
<Message type="note">
7879
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
79-
</Message>
80+
</Message>
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
meta:
3+
title: Understanding the Mistral-small-24b-base-2501 model
4+
description: Deploy your own secure Mistral-small-24b-base-2501 model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Understanding the Mistral-small-24b-base-2501 model
7+
paragraph: This page provides information on the Mistral-small-24b-base-2501 model
8+
tags:
9+
dates:
10+
validation: 2025-03-04
11+
posted: 2025-03-04
12+
categories:
13+
- ai-data
14+
---
15+
16+
## Model overview
17+
18+
| Attribute | Details |
19+
|-----------------|------------------------------------|
20+
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21+
| Compatible Instances | L40S, H100, H100-2 (FP8) |
22+
| Context size | 32K tokens |
23+
24+
## Model name
25+
26+
```bash
27+
mistral/mistral-small-24b-instruct-2501:fp8
28+
```
29+
30+
## Compatible Instances
31+
32+
| Instance type | Max context length |
33+
| ------------- |-------------|
34+
| L40 | 20k (FP8) |
35+
| H100 | 32k (FP8) |
36+
| H100-2 | 32k (FP8) |
37+
38+
## Model introduction
39+
40+
Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
41+
This model is open-weight and distributed under the Apache 2.0 license.
42+
43+
## Why is it useful?
44+
45+
- Mistral Small 24B offers a large context window of up to 32k tokens and provide both conversational and reasoning capabilities.
46+
- This model supports multiple languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
47+
- It supersedes Mistral Nemo Instruct, although its tokens throughput is slightly lower.
48+
49+
## How to use it
50+
51+
### Sending Inference requests
52+
53+
To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
54+
55+
```bash
56+
curl -s \
57+
-H "Authorization: Bearer <IAM API key>" \
58+
-H "Content-Type: application/json" \
59+
--request POST \
60+
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
61+
--data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}'
62+
```
63+
64+
Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
65+
66+
<Message type="note">
67+
Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
68+
</Message>
69+
70+
### Receiving Managed Inference responses
71+
72+
Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server.
73+
Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.
74+
75+
<Message type="note">
76+
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
77+
</Message>

0 commit comments

Comments
 (0)