feat(ai): Support for Qwen2.5-coder-32b-instruct (#4098)

tgenaitay · web-flow · commit ca203c82f70f · 2024-12-09T17:15:02.000+01:00
diff --git a/ai-data/generative-apis/how-to/query-code-models.mdx b/ai-data/generative-apis/how-to/query-code-models.mdx
@@ -0,0 +1,63 @@
+---
+meta:
+  title: How to query code models
+  description: Learn how to interact with powerful language models specialized in code using Scaleway's Generative APIs service.
+content:
+  h1: How to query code models
+  paragraph: Learn how to interact with powerful language models specialized in code using Scaleway's Generative APIs service.
+tags: generative-apis ai-data language-models code-models chat-completions-api
+dates:
+  validation: 2024-12-09
+  posted: 2024-12-09
+---
+
+Scaleway's Generative APIs service allows users to interact with powerful code models hosted on the platform.
+
+Code models are inherently language models specialized in **understanding code**, **generating code** and **fixing code**. 
+
+As such, they will be available through the same interfaces as language models:
+- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/ai-data/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
+- Via the [Chat API](/ai-data/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
+For more information on how to query language models, read [our dedicated documentation](/ai-data/generative-apis/how-to/query-language-models/).
+
+Code models are also ideal AI assistants when added to IDEs (integrated development environments). 
+
+<Macro id="requirements" />
+
+- A Scaleway account logged into the [console](https://console.scaleway.com)
+- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
+- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
+- An IDE such as VS Code or JetBrains
+
+## Install Continue in your IDE
+
+[Continue](https://www.continue.dev/) is an [open-source code assistant](https://github.com/continuedev/continue) to connect AI models to your IDE.
+
+To get Continue, simply hit `install` 
+- on the [Continue extension page in Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue)
+- or on the [Continue extension page in JetBrains Marketplace](https://plugins.jetbrains.com/plugin/22707-continue)
+
+## Configure Scaleway as an API provider in Continue
+
+Continue's `config.json` file will set models and providers allowed for chat, autocompletion etc.
+Here is an example configuration with Scaleway's OpenAI-compatible provider:
+
+```json
+"models": [
+    {
+      "model": "qwen2.5-coder-32b-instruct",
+      "title": "Qwen2.5-coder",
+      "apiBase": "https://api.scaleway.ai/v1/",
+      "provider": "openai",
+      "apiKey": "###SCW SECRET KEY###",
+      "useLegacyCompletionsEndpoint": false
+    }
+]
+```
+
+<Message type="tip">
+ The config.json file is typically stored as $HOME/.continue/config.json on Linux/macOS systems, and %USERPROFILE%\.continue\config.json on Windows.
+</Message>
+
+Read more about how to set up your `config.json` on the [official Continue documentation](https://docs.continue.dev/reference).
+
diff --git a/ai-data/generative-apis/reference-content/rate-limits.mdx b/ai-data/generative-apis/reference-content/rate-limits.mdx
@@ -7,7 +7,7 @@ content:
   paragraph: Find our service limits in tokens per minute and queries per minute
 tags: generative-apis ai-data rate-limits
 dates:
-  validation: 2024-10-30
+  validation: 2024-12-09
   posted: 2024-08-27
 ---
 
@@ -25,6 +25,7 @@ Any model served through Scaleway Generative APIs gets limited by:
 | `llama-3.1-70b-instruct` | 300 | 100K |
 | `mistral-nemo-instruct-2407`| 300 | 100K |
 | `pixtral-12b-2409`| 300 | 100K |
+| `qwen2.5-32b-instruct`| 300 | 100K |
 
 ### Embedding models 
 
diff --git a/ai-data/generative-apis/reference-content/supported-models.mdx b/ai-data/generative-apis/reference-content/supported-models.mdx
@@ -25,6 +25,7 @@ Our [Chat API](/ai-data/generative-apis/how-to/query-language-models) has built-
 | Meta        | `llama-3.1-70b-instruct`  | 128k  | [Llama 3.1 Community License Agreement](https://llama.meta.com/llama3_1/license/) | [HF](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) |
 | Mistral      | `mistral-nemo-instruct-2407`                 | 128k | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) |
 | Mistral      | `pixtral-12b-2409`                 | 128k | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Pixtral-12B-2409) |
+| Qwen      | `qwen-2.5-coder-32b-instruct`                 | 128k | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
 
 
 <Message type="tip">
diff --git a/ai-data/managed-inference/reference-content/qwen2.5-coder-32b-instruct.mdx b/ai-data/managed-inference/reference-content/qwen2.5-coder-32b-instruct.mdx
@@ -0,0 +1,78 @@
+---
+meta:
+  title: Understanding the Qwen2.5-Coder-32B-Instruct model
+  description: Deploy your own secure Qwen2.5-Coder-32B-Instruct model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1: Understanding the Qwen2.5-Coder-32B-Instruct model
+  paragraph: This page provides information on the Qwen2.5-Coder-32B-Instruct model
+tags:
+dates:
+  validation: 2024-12-08
+  posted: 2024-12-08
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Qwen](https://qwenlm.github.io/)  |
+| License        | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE)  |
+| Compatible Instances | H100, H100-2 (INT8) |
+| Context Length | up to 128k tokens |
+
+## Model names
+
+```bash
+qwen/qwen2.5-coder-32b-instruct:int8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| H100      | 128k (INT8)
+| H100-2      | 128k (INT8)
+
+## Model introduction
+
+Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
+With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning. 
+
+## Why is it useful?
+
+- Qwen2.5-coder achieved the best performance on multiple popular code generation benchmarks (EvalPlus, LiveCodeBench, BigCodeBench), outranking many open-source models and providing competitive performance with GPT-4o.
+- This model is versatile. While demonstrating strong and comprehensive coding abilities, it also possesses good general and mathematical skills.
+
+## How to use it
+
+### Sending Managed Inference requests
+
+To perform inference tasks with your Qwen2.5-coder deployed at Scaleway, use the following command:
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions" \
+--data '{"model":"qwen/qwen2.5-coder-32b-instruct:int8", "messages":[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful code assistant."},{"role": "user","content": "Write a quick sort algorithm."}], "max_tokens": 1000, "temperature": 0.8, "stream": false}'
+```
+
+<Message type="tip">
+  The model name allows Scaleway to put your prompts in the expected format.
+</Message>
+
+<Message type="note">
+  Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+</Message>
+
+### Receiving Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request.
+
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -720,6 +720,10 @@
                   {
                     "label": "Moshiko-0.1-8b model",
                     "slug": "moshiko-0.1-8b"
+                  },
+                  {
+                    "label": "Qwen2.5-coder-32b-instruct model",
+                    "slug": "qwen2.5-coder-32b-instruct"
                   }
                 ],
                 "label": "Additional Content",
@@ -757,6 +761,10 @@
                     "label": "Query embedding models",
                     "slug": "query-embedding-models"
                   },
+                  {
+                    "label": "Query code models",
+                    "slug": "query-code-models"
+                  },
                   {
                     "label": "Use structured outputs",
                     "slug": "use-structured-outputs"