fix(genapi): update deepseek-r1-distill-llama-70b.mdx (#4545)

fpagny · bene2k1 · RoRoJ · web-flow · commit df0a8720bc7c · 2025-03-05T13:05:55.000+01:00
* Update deepseek-r1-distill-llama-70b.mdx

Update supported models in Managed Inference.

* Update deepseek-r1-distill-llama-8b.mdx

* Update llama-3.3-70b-instruct.mdx

* Create mistral-small-24b-instruct-2501.mdx

* Update pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

---------

Co-authored-by: Benedikt Rollik &lt;brollik@scaleway.com&gt;
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx
@@ -19,8 +19,8 @@ categories:
 |-----------------|------------------------------------|
 | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)  |
 | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
-| Compatible Instances | H100-2 (BF16) |
-| Context Length | up to 56k tokens |
+| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
+| Context Length | up to 131k tokens |
 
 ## Model names
 
@@ -32,7 +32,8 @@ deepseek/deepseek-r1-distill-llama-70b:bf16
 
 | Instance type  | Max context length |
 | ------------- |-------------|
-| H100-2      | 56k (BF16) |
+| H100      | 15k (FP8) |
+| H100-2      | 131k (FP8), 56k (BF16) |
 
 ## Model introduction
 
diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx
@@ -19,7 +19,7 @@ categories:
 |-----------------|------------------------------------|
 | Provider        | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)  |
 | License        | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md)  |
-| Compatible Instances | L4, L40S, H100 (BF16) |
+| Compatible Instances | L4, L40S, H100 (FP8, BF16) |
 | Context Length | up to 131k tokens |
 
 ## Model names
@@ -32,9 +32,9 @@ deepseek/deepseek-r1-distill-llama-8b:bf16
 
 | Instance type  | Max context length |
 | ------------- |-------------|
-| L4      | 39k (BF16) | 
-| L40S      | 131k (BF16) | 
-| H100      | 131k (BF16) |
+| L4      | 90k (FP8), 39k (BF16) | 
+| L40S      | 131k (FP8, BF16) | 
+| H100      | 131k (FP8, BF16) |
 
 ## Model introduction
 
diff --git a/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx b/pages/managed-inference/reference-content/llama-3.3-70b-instruct.mdx
@@ -19,8 +19,8 @@ categories:
 |-----------------|------------------------------------|
 | Provider        | [Meta](https://www.llama.com/)  |
 | License        | [Llama 3.3 community](https://www.llama.com/llama3_3/license/)  |
-| Compatible Instances | H100-2 (BF16) |
-| Context length | Up to 70k tokens    |
+| Compatible Instances | H100 (FP8), H100-2 (FP8, BF16) |
+| Context length | Up to 131k tokens    |
 
 ## Model names
 
@@ -32,7 +32,8 @@ meta/llama-3.3-70b-instruct:bf16
 
 | Instance type  | Max context length |
 | ------------- |-------------|
-| H100-2      | 62k (BF16) |
+| H100      | 15k (FP8) |
+| H100-2      | 131k (FP8), 62k (BF16) |
 
 ## Model introduction
 
@@ -76,4 +77,4 @@ Process the output data according to your application's needs. The response will
 
 <Message type="note">
   Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
-</Message>
+</Message>
diff --git a/pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx b/pages/managed-inference/reference-content/mistral-small-24b-instruct-2501.mdx
@@ -0,0 +1,77 @@
+---
+meta:
+  title: Understanding the Mistral-small-24b-base-2501 model
+  description: Deploy your own secure Mistral-small-24b-base-2501 model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1:  Understanding the Mistral-small-24b-base-2501 model
+  paragraph: This page provides information on the Mistral-small-24b-base-2501 model
+tags:
+dates:
+  validation: 2025-03-04
+  posted: 2025-03-04
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Mistral](https://mistral.ai/technology/#models)  |
+| Compatible Instances | L40S, H100, H100-2 (FP8) |
+| Context size | 32K tokens |
+
+## Model name
+
+```bash
+mistral/mistral-small-24b-instruct-2501:fp8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| L40      | 20k (FP8) |
+| H100      | 32k (FP8) |
+| H100-2      | 32k (FP8) |
+
+## Model introduction
+
+Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral.
+This model is open-weight and distributed under the Apache 2.0 license.
+
+## Why is it useful?
+
+- Mistral Small 24B offers a large context window of up to 32k tokens and provide both conversational and reasoning capabilities.
+- This model supports multiple languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
+- It supersedes Mistral Nemo Instruct, although its tokens throughput is slightly lower.
+
+## How to use it
+
+### Sending Inference requests
+
+To perform inference tasks with your Mistral model deployed at Scaleway, use the following command:
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+--data '{"model":"mistral/mistral-small-24b-instruct-2501:fp8", "messages":[{"role": "user","content": "Tell me about Scaleway."}], "top_p": 1, "temperature": 0.7, "stream": false}'
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+<Message type="note">
+  Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+</Message>
+
+### Receiving Managed Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.
+
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>