Skip to content

Commit f110780

Browse files
authored
Add L40S to Managed Inference models (#4378)
* Add L40S to Managed Inference models * Update llama-3-70b-instruct.mdx * Update llama-3.1-8b-instruct.mdx * Update deepseek-r1-distill-llama-8b.mdx * Update pixtral-12b-2409.mdx * Update mistral-7b-instruct-v0.3.mdx * Update mistral-nemo-instruct-2407.mdx * Update pixtral-12b-2409.mdx * Update llama-3.1-8b-instruct.mdx * Update deepseek-r1-distill-llama-8b.mdx * Update bge-multilingual-gemma2.mdx
1 parent 42312ed commit f110780

File tree

8 files changed

+35
-23
lines changed

8 files changed

+35
-23
lines changed

pages/managed-inference/reference-content/bge-multilingual-gemma2.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dates:
1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
2020
| Provider | [baai](https://huggingface.co/BAAI) |
21-
| Compatible Instances | L4 (FP32) |
21+
| Compatible Instances | L4, L40S (FP32) |
2222
| Context size | 4096 tokens |
2323

2424
## Model name
@@ -32,6 +32,7 @@ baai/bge-multilingual-gemma2:fp32
3232
| Instance type | Max context length |
3333
| ------------- |-------------|
3434
| L4 | 4096 (FP32) |
35+
| L40S | 4096 (FP32) |
3536

3637
## Model introduction
3738

pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ categories:
1919
|-----------------|------------------------------------|
2020
| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
2121
| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) |
22-
| Compatible Instances | L4, H100 (BF16) |
22+
| Compatible Instances | L4, L40S, H100 (BF16) |
2323
| Context Length | up to 131k tokens |
2424

2525
## Model names
@@ -33,6 +33,7 @@ deepseek/deepseek-r1-distill-llama-8b:bf16
3333
| Instance type | Max context length |
3434
| ------------- |-------------|
3535
| L4 | 39k (BF16) |
36+
| L40S | 131k (BF16) |
3637
| H100 | 131k (BF16) |
3738

3839
## Model introduction

pages/managed-inference/reference-content/llama-3-70b-instruct.mdx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ categories:
1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
2020
| Provider | [Meta](https://llama.meta.com/llama3/) |
21-
| Compatible Instances | H100 (FP8) |
21+
| Compatible Instances | H100, H100-2 (FP8) |
2222
| Context size | 8192 tokens |
2323

2424
## Model names
@@ -30,6 +30,7 @@ meta/llama-3-70b-instruct:fp8
3030
## Compatible Instances
3131

3232
- [H100 (FP8)](https://www.scaleway.com/en/h100-pcie-try-it-now/)
33+
- H100-2 (FP8)
3334

3435
## Model introduction
3536

@@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will
8283

8384
<Message type="note">
8485
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
85-
</Message>
86+
</Message>

pages/managed-inference/reference-content/llama-3-8b-instruct.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ categories:
1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
2020
| Provider | [Meta](https://llama.meta.com/llama3/) |
21-
| Compatible Instances | L4, H100 (FP8, BF16) |
21+
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
2222
| Context size | 8192 tokens |
2323

2424
## Model names
@@ -33,7 +33,9 @@ meta/llama-3-8b-instruct:fp8
3333
| Instance type | Max context length |
3434
| ------------- |-------------|
3535
| L4 | 8192 (FP8, BF16) |
36-
| H100 | 8192 (FP8, BF16)
36+
| L40S | 8192 (FP8, BF16) |
37+
| H100 | 8192 (FP8, BF16) |
38+
| H100-2 | 8192 (FP8, BF16) |
3739

3840
## Model introduction
3941

pages/managed-inference/reference-content/llama-3.1-8b-instruct.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ categories:
1919
|-----------------|------------------------------------|
2020
| Provider | [Meta](https://llama.meta.com/llama3/) |
2121
| License | [Llama 3.1 community](https://llama.meta.com/llama3_1/license/) |
22-
| Compatible Instances | L4, H100, H100-2 (FP8, BF16) |
22+
| Compatible Instances | L4, L40S, H100, H100-2 (FP8, BF16) |
2323
| Context Length | up to 128k tokens |
2424

2525
## Model names
@@ -34,8 +34,9 @@ meta/llama-3.1-8b-instruct:bf16
3434
| Instance type | Max context length |
3535
| ------------- |-------------|
3636
| L4 | 96k (FP8), 27k (BF16) |
37-
| H100 | 128k (FP8, BF16)
38-
| H100-2 | 128k (FP8, BF16)
37+
| L40S | 128k (FP8, BF16) |
38+
| H100 | 128k (FP8, BF16) |
39+
| H100-2 | 128k (FP8, BF16) |
3940

4041
## Model introduction
4142

@@ -82,4 +83,4 @@ Process the output data according to your application's needs. The response will
8283

8384
<Message type="note">
8485
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
85-
</Message>
86+
</Message>

pages/managed-inference/reference-content/mistral-7b-instruct-v0.3.mdx

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ categories:
1717

1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
20-
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21-
| Compatible Instances | L4 (BF16) |
20+
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21+
| Compatible Instances | L4, L40S, H100, H100-2 (BF16) |
2222
| Context size | 32K tokens |
2323

2424
## Model name
@@ -31,7 +31,10 @@ mistral/mistral-7b-instruct-v0.3:bf16
3131

3232
| Instance type | Max context length |
3333
| ------------- |-------------|
34-
| L4 | 32k (BF16)
34+
| L4 | 32k (BF16) |
35+
| L40S | 32k (BF16) |
36+
| H100 | 32k (BF16) |
37+
| H100-2 | 32k (BF16) |
3538

3639
## Model introduction
3740

@@ -75,4 +78,4 @@ Process the output data according to your application's needs. The response will
7578

7679
<Message type="note">
7780
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
78-
</Message>
81+
</Message>

pages/managed-inference/reference-content/mistral-nemo-instruct-2407.mdx

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ categories:
1717

1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
20-
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21-
| Compatible Instances | H100 (FP8) |
22-
| Context size | 128K tokens |
20+
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21+
| Compatible Instances | L40S, H100, H100-2 (FP8) |
22+
| Context size | 128K tokens |
2323

2424
## Model name
2525

@@ -31,7 +31,9 @@ mistral/mistral-nemo-instruct-2407:fp8
3131

3232
| Instance type | Max context length |
3333
| ------------- |-------------|
34-
| H100 | 128k (FP8)
34+
| L40 | 128k (FP8) |
35+
| H100 | 128k (FP8) |
36+
| H100-2 | 128k (FP8) |
3537

3638
## Model introduction
3739

@@ -81,4 +83,4 @@ Process the output data according to your application's needs. The response will
8183

8284
<Message type="note">
8385
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
84-
</Message>
86+
</Message>

pages/managed-inference/reference-content/pixtral-12b-2409.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ categories:
1717

1818
| Attribute | Details |
1919
|-----------------|------------------------------------|
20-
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21-
| Compatible Instances | H100, H100-2 (bf16) |
22-
| Context size | 128k tokens |
20+
| Provider | [Mistral](https://mistral.ai/technology/#models) |
21+
| Compatible Instances | L40S, H100, H100-2 (bf16) |
22+
| Context size | 128k tokens |
2323

2424
## Model name
2525

@@ -31,6 +31,7 @@ mistral/pixtral-12b-2409:bf16
3131

3232
| Instance type | Max context length |
3333
| ------------- |-------------|
34+
| L40S | 50k (BF16)
3435
| H100 | 128k (BF16)
3536
| H100-2 | 128k (BF16)
3637

@@ -162,4 +163,4 @@ Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported.
162163
The only limitation is in context window (1 token for each 16x16 pixel).
163164

164165
#### What is the maximum amount of images per conversation?
165-
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
166+
One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.

0 commit comments

Comments
 (0)