diff --git a/menu/navigation.json b/menu/navigation.json index a1d15dd0b2..f4f4a68a1d 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -817,6 +817,14 @@ "label": "Llama-3.3-70b-instruct model", "slug": "llama-3.3-70b-instruct" }, + { + "label": "DeepSeek-R1-Distill-Llama-70B model", + "slug": "deepseek-r1-distill-llama-70b" + }, + { + "label": "DeepSeek-R1-Distill-Llama-8B model", + "slug": "deepseek-r1-distill-llama-8b" + }, { "label": "Mistral-7b-instruct-v0.3 model", "slug": "mistral-7b-instruct-v0.3" diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx new file mode 100644 index 0000000000..f6cf661089 --- /dev/null +++ b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-70b.mdx @@ -0,0 +1,81 @@ +--- +meta: + title: Understanding the DeepSeek-R1-Distill-Llama-70B model + description: Deploy your own secure DeepSeek-R1-Distill-Llama-70B model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the DeepSeek-R1-Distill-Llama-70B model + paragraph: This page provides information on the DeepSeek-R1-Distill-Llama-70B model +tags: +dates: + validation: 2025-02-06 + posted: 2025-02-06 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | +| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Compatible Instances | H100-2 (BF16) | +| Context Length | up to 56k tokens | + +## Model names + +```bash +deepseek/deepseek-r1-distill-llama-70b:bf16 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| H100-2 | 56k (BF16) | + +## Model introduction + +Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use case such as mathematics and coding tasks. + +## Why is it useful? + +It is great to see Deepseek improving open(weight) models, and we are excited to fully support their mission with integration in the Scaleway ecosystem. + +- DeepSeek-R1-Distill-Llama was optimized to reach accuracy close to Deepseek-R1 in tasks like mathematics and coding, while keeping inference costs limited and tokens speed efficient. +- DeepSeek-R1-Distill-Llama supports a context window of up to 56K tokens and tool calling, keeping interaction with other components possible. + +## How to use it + +### Sending Managed Inference requests + +To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ +--data '{"model":"deepseek/deepseek-r1-distill-llama-70b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' +``` + +Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. + + + + This model is better used without `system prompt`, as suggested by the model provider. + + +### Receiving inference responses + +Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the Managed Inference server. +Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request. + + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + diff --git a/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx new file mode 100644 index 0000000000..bde51b9140 --- /dev/null +++ b/pages/managed-inference/reference-content/deepseek-r1-distill-llama-8b.mdx @@ -0,0 +1,82 @@ +--- +meta: + title: Understanding the DeepSeek-R1-Distill-Llama-8B model + description: Deploy your own secure DeepSeek-R1-Distill-Llama-8B model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the DeepSeek-R1-Distill-Llama-8B model + paragraph: This page provides information on the DeepSeek-R1-Distill-Llama-8B model +tags: +dates: + validation: 2025-02-06 + posted: 2025-02-06 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Deepseek](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | +| License | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | +| Compatible Instances | L4, H100 (BF16) | +| Context Length | up to 131k tokens | + +## Model names + +```bash +deepseek/deepseek-r1-distill-llama-8b:bf16 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| L4 | 39k (BF16) | +| H100 | 131k (BF16) | + +## Model introduction + +Released January 21, 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. +DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks. + +## Why is it useful? + +It is great to see Deepseek improving open(weight) models, and we are excited to fully support their mission with integration in the Scaleway ecosystem. + +- DeepSeek-R1-Distill-Llama was optimized to reach accuracy close to Deepseek-R1 in tasks like mathematics and coding, while keeping inference costs limited and tokens speed efficient. +- DeepSeek-R1-Distill-Llama supports a context window of up to 131K tokens and tool calling, keeping interaction with other components possible. + +## How to use it + +### Sending Managed Inference requests + +To perform inference tasks with your DeepSeek R1 Distill Llama deployed at Scaleway, use the following command: + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scaleway.com/v1/chat/completions" \ +--data '{"model":"deepseek/deepseek-r1-distill-llama-8b:fp8", "messages":[{"role": "user","content": "There is a llama in my garden, what should I do?"}], "max_tokens": 500, "temperature": 0.7, "stream": false}' +``` + +Make sure to replace `` and `` with your actual [IAM API key](/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + Ensure that the `messages` array is properly formatted with roles (user, assistant) and content. + + + + This model is better used without `system prompt`, as suggested by the model provider. + + +### Receiving inference responses + +Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. +Process the output data according to your application's needs. The response will contain the output generated by the LLM model based on the input provided in the request. + + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + diff --git a/pages/managed-inference/reference-content/function-calling-support.mdx b/pages/managed-inference/reference-content/function-calling-support.mdx index b544ffb3c3..e273ec3277 100644 --- a/pages/managed-inference/reference-content/function-calling-support.mdx +++ b/pages/managed-inference/reference-content/function-calling-support.mdx @@ -32,6 +32,8 @@ The following models in Scaleway's Managed Inference library can call tools as p * mistral/mistral-nemo-instruct-2407 * mistral/pixtral-12b-2409 * nvidia/llama-3.1-nemotron-70b-instruct +* deepseek/deepseek-r1-distill-llama-70b +* deepseek/deepseek-r1-distill-llama-8b ## Understanding function calling diff --git a/pages/managed-inference/reference-content/llama-3-8b-instruct.mdx b/pages/managed-inference/reference-content/llama-3-8b-instruct.mdx index d4054e4097..d018be991c 100644 --- a/pages/managed-inference/reference-content/llama-3-8b-instruct.mdx +++ b/pages/managed-inference/reference-content/llama-3-8b-instruct.mdx @@ -30,8 +30,6 @@ meta/llama-3-8b-instruct:fp8 ## Compatible Instances -## Compatible Instances - | Instance type | Max context length | | ------------- |-------------| | L4 | 8192 (FP8, BF16) | @@ -86,4 +84,4 @@ Process the output data according to your application's needs. The response will Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. - \ No newline at end of file +