Skip to content

Commit 0386569

Browse files
committed
feat(ai): add qwen to managed inference
1 parent 038f3df commit 0386569

File tree

2 files changed

+80
-1
lines changed

2 files changed

+80
-1
lines changed

ai-data/generative-apis/reference-content/rate-limits.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ content:
77
paragraph: Find our service limits in tokens per minute and queries per minute
88
tags: generative-apis ai-data rate-limits
99
dates:
10-
validation: 2024-10-30
10+
validation: 2024-12-09
1111
posted: 2024-08-27
1212
---
1313

@@ -25,6 +25,7 @@ Any model served through Scaleway Generative APIs gets limited by:
2525
| `llama-3.1-70b-instruct` | 300 | 100K |
2626
| `mistral-nemo-instruct-2407`| 300 | 100K |
2727
| `pixtral-12b-2409`| 300 | 100K |
28+
| `qwen2.5-32b-instruct`| 300 | 100K |
2829

2930
### Embedding models
3031

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
meta:
3+
title: Understanding the Qwen2.5-Coder-32B-Instruct model
4+
description: Deploy your own secure Qwen2.5-Coder-32B-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed.
5+
content:
6+
h1: Understanding the Qwen2.5-Coder-32B-Instruct model
7+
paragraph: This page provides information on the Qwen2.5-Coder-32B-Instruct model
8+
tags:
9+
dates:
10+
validation: 2024-12-08
11+
posted: 2024-12-08
12+
categories:
13+
- ai-data
14+
---
15+
16+
## Model overview
17+
18+
| Attribute | Details |
19+
|-----------------|------------------------------------|
20+
| Provider | [Qwen](https://qwenlm.github.io/) |
21+
| License | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) |
22+
| Compatible Instances | H100, H100-2 (INT8) |
23+
| Context Length | up to 128k tokens |
24+
25+
## Model names
26+
27+
```bash
28+
qwen/qwen2.5-coder-32b-instruct:int8
29+
```
30+
31+
## Compatible Instances
32+
33+
| Instance type | Max context length |
34+
| ------------- |-------------|
35+
| H100 | 128k (INT8)
36+
| H100-2 | 128k (INT8)
37+
38+
## Model introduction
39+
40+
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
41+
With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
42+
43+
## Why is it useful?
44+
45+
- Qwen2.5-coder achieved the best performance on multiple popular code generation benchmarks (EvalPlus, LiveCodeBench, BigCodeBench), outranking many open-source models and providing competitive performance with GPT-4o.
46+
- This model is versatile. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills.
47+
48+
## How to use it
49+
50+
### Sending Managed Inference requests
51+
52+
To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command:
53+
54+
```bash
55+
curl -s \
56+
-H "Authorization: Bearer <IAM API key>" \
57+
-H "Content-Type: application/json" \
58+
--request POST \
59+
--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
60+
--data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}'
61+
```
62+
63+
<Message type="tip">
64+
The model name allows Scaleway to put your prompts in the expected format.
65+
</Message>
66+
67+
<Message type="note">
68+
Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
69+
</Message>
70+
71+
### Receiving Inference responses
72+
73+
Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server.
74+
Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.
75+
76+
<Message type="note">
77+
Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
78+
</Message>

0 commit comments

Comments
 (0)