feat(ai): add qwen to managed inference

tgenaitay · tgenaitay · commit 0386569bd31c · 2024-12-09T16:59:05.000+01:00
diff --git a/ai-data/generative-apis/reference-content/rate-limits.mdx b/ai-data/generative-apis/reference-content/rate-limits.mdx
@@ -7,7 +7,7 @@ content:
   paragraph: Find our service limits in tokens per minute and queries per minute
 tags: generative-apis ai-data rate-limits
 dates:
-  validation: 2024-10-30
+  validation: 2024-12-09
   posted: 2024-08-27
 ---
 
@@ -25,6 +25,7 @@ Any model served through Scaleway Generative APIs gets limited by:
 | `llama-3.1-70b-instruct` | 300 | 100K |
 | `mistral-nemo-instruct-2407`| 300 | 100K |
 | `pixtral-12b-2409`| 300 | 100K |
+| `qwen2.5-32b-instruct`| 300 | 100K |
 
 ### Embedding models 
 
diff --git a/ai-data/managed-inference/reference-content/qwen2.5-coder-32b-instruct.mdx b/ai-data/managed-inference/reference-content/qwen2.5-coder-32b-instruct.mdx
@@ -0,0 +1,78 @@
+---
+meta:
+  title: Understanding the Qwen2.5-Coder-32B-Instruct model
+  description: Deploy your own secure Qwen2.5-Coder-32B-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the Qwen2.5-Coder-32B-Instruct model
+  paragraph: This page provides information on the Qwen2.5-Coder-32B-Instruct model
+tags:
+dates:
+  validation: 2024-12-08
+  posted: 2024-12-08
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Qwen](https://qwenlm.github.io/)  |
+| License        | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE)  |
+| Compatible Instances | H100, H100-2 (INT8) |
+| Context Length | up to 128k tokens |
+
+## Model names
+
+```bash
+qwen/qwen2.5-coder-32b-instruct:int8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| H100      | 128k (INT8)
+| H100-2      | 128k (INT8)
+
+## Model introduction
+
+Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
+
+## Why is it useful?
+
+- Qwen2.5-coder achieved the best performance on multiple popular code generation benchmarks (EvalPlus, LiveCodeBench, BigCodeBench), outranking many open-source models and providing competitive performance with GPT-4o.
+- This model is versatile. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills.
+
+## How to use it
+
+### Sending Managed Inference requests
+
+To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command:
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+--data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}'
+```
+
+<Message type="tip">
+  The model name allows Scaleway to put your prompts in the expected format.
+</Message>
+
+<Message type="note">
+  Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+</Message>
+
+### Receiving Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.
+
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>