From 55b6131dd7a9c25d10f8eab62b82bced2a35a264 Mon Sep 17 00:00:00 2001 From: Quentin Maire Date: Wed, 13 Aug 2025 10:06:50 +0200 Subject: [PATCH 1/3] FR Add AI Endpoints guide for virtual models --- pages/index.md | 1 + .../guide.en-gb.md | 144 ++++++++++++++++++ .../guide.fr-fr.md | 144 ++++++++++++++++++ .../meta.yaml | 2 + 4 files changed, 291 insertions(+) create mode 100644 pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md create mode 100644 pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md create mode 100644 pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/meta.yaml diff --git a/pages/index.md b/pages/index.md index 0dd66cf442f..074fca35370 100644 --- a/pages/index.md +++ b/pages/index.md @@ -1202,6 +1202,7 @@ + [AI Endpoints - Billing and lifecycle](public_cloud/ai_machine_learning/endpoints_guide_04_billing_concept) + [AI Endpoints - Structured Output](public_cloud/ai_machine_learning/endpoints_guide_05_structured_output) + [AI Endpoints - Function Calling](public_cloud/ai_machine_learning/endpoints_guide_06_function_calling) + + [AI Endpoints - Virtual Models](public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/) + [Tutorials](public-cloud-ai-and-machine-learning-ai-endpointstutorials) + [AI Endpoints - Create your own audio summarizer](public_cloud/ai_machine_learning/endpoints_tuto_01_audio_summarizer) + [AI Endpoints - Create your own voice assistant](public_cloud/ai_machine_learning/endpoints_tuto_02_voice_virtual_assistant) diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md new file mode 100644 index 00000000000..8f15cf18d33 --- /dev/null +++ b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md @@ -0,0 +1,144 @@ +--- +title: AI Endpoints - Using Virtual Models +excerpt: Learn how to use OVHcloud AI Endpoints Virtual Models +updated: 2025-08-06 +--- + +> [!primary] +> +> AI Endpoints is covered by the **[OVHcloud AI Endpoints Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/48743bf-AI_Endpoints-ALL-1.1.pdf)** and the **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. +> + +## Introduction + +Choosing the right Large Language Model (LLM) is not always straightforward. Models vary in strengths, performance, cost, and licensing, and new ones appear regularly—often outperforming previous options. This rapid evolution makes it essential to match your choice to your specific needs, while staying ready to adapt as better models emerge. + +To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, ...) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. + +## Requirements + +The examples provided during this guide can be used with one of the following environments: + +> [!tabs] +> **Python** +>> +>> A [Python](https://www.python.org/) environment with the [openai client](https://pypi.org/project/openai/). +>> +>> ```sh +>> pip install openai +>> ``` +>> +>> **Curl** +>> +>> A standard terminal, with [curl](https://curl.se/) installed on the system. +>> + +### Authentication & rate limiting + +Most of the examples provided in this guide use anonymous authentication, which makes it simpler to use but may cause rate limiting issues. +If you wish to enable authentication using your own token, simply specify your API key within the requests. + +Follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide for more information on authentication. + +## Model DSL + +When you request a LLM generation through our unified endpoint, you can provide in the OpenAI-compliant `model` field a model DSL query instead of a hardcoded model name. +These queries are divided into three parts: tag, ranker, and condition. +- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, ...) +- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest +- **Condition**: The condition allows you to filter models based on strict requirements on some of the model specifications such as context_size, max_tokens and input_cost. These conditions support basic operators (<, >, =). + +Below are some example queries and the models they currently resolve to. Please note that the resolved model can change, as we continuously update our catalog with new model releases. + +| Model Query | Current Target Model | Usage | +|-----------|-----------|-----------| +| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | +| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | +| mistral@latest?context_size > 100000 | Mistral-Small-3.2-24B-Instruct-2506 | The latest Mistral model with a context window greater than 100k tokens | +| llama@biggest?input_cost<0.5 | Llama-3.1-8B-Instruct | The largest LLaMA model whose input token cost is under €0.50 per 1M tokens | + +You can visit our [catalog](https://endpoints.ai.cloud.ovh.net/catalog) to learn more about the different model specifications. + +## Example Usage + +The following code samples provide a simple example on how to query our API with a model query. + +> [!tabs] +> **Python** +>> ```python +>> import os +>> from openai import OpenAI +>> +>> url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" +>> +>> client = OpenAI( +>> base_url=url, +>> api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN") +>> ) +>> +>> def chat_completion(new_message: str) -> str: +>> history_openai_format = [{"role": "user", "content": new_message}] +>> result = client.chat.completions.create( +>> model="mistral@latest", +>> messages=history_openai_format, +>> temperature=0, +>> max_tokens=1024 +>> ) +>> return result.model, result.choices.pop().message.content +>> +>> if __name__ == '__main__': +>> model, message = chat_completion("Explain gravity for a 6 years old") +>> print(f"Model: {model}:", message) +>> ``` +>> +>> Output: +>> +>> ```sh +>> Model: Mistral-Small-3.2-24B-Instruct-2506: Sure! Here’s a simple way to explain gravity to a 6-year-old: +>> +>> **"Gravity is like a big, invisible hug from the Earth! It’s what keeps us from floating away into space. When you jump, gravity pulls you back down to the ground. It’s also why things fall—like when you drop your toy, it doesn’t float up, it falls down because of gravity. Even the Moon stays close to Earth because of gravity!"** +>> +>> You can also add: +>> - *"If you throw a ball up, gravity pulls it back down."* +>> - *"Without gravity, we’d all be floating like astronauts in space!"* +>> +>> Would you like a fun example or a little experiment to go with it? 😊 +>> ``` +>> +> **Curl** +>> +>> ```sh +>> curl https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions -H "Content-Type: application/json" -d '{ +>> "messages": [{ +>> "role": "user", +>> "content": "What is the capital of France ?" +>> }], "model": "llama@biggest?input_cost<0.5", +>> "seed": 21, "max_tokens": 20 +>> }' +>> ``` +>> +>> Output response: +>> +>> ```sh +>> {"id":"chatcmpl-5df407ebdf23464f891b1b1d8cb3e76d","object":"chat.completion","created":1755007630,"model":"Llama-3.1-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of France is Paris."},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":42,"completion_tokens":8,"total_tokens":50}} +>> ``` +>> + +## Conclusion + +Using OVHcloud AI Endpoints with virtual models allows you to stay up to date with the best available LLMs without having to change your code whenever a new release arrives. +By defining your requirements through tags, rankers, and conditions, you can ensure your application always runs on the most suitable model for your needs—whether you prioritize speed, cost, size, or capabilities. This flexibility makes it easier to build, maintain, and scale AI-powered solutions over time. + +## Go further + +Browse the full [AI Endpoints documentation](/products/public-cloud-ai-and-machine-learning-ai-endpoints) to further understand the main concepts and get started. + +To discover how to build complete and powerful applications using AI Endpoints, explore our dedicated [AI Endpoints guides](/products/public-cloud-ai-and-machine-learning-ai-endpoints). + +If you need training or technical assistance to implement our solutions, contact your sales representative or click on [this link](/links/professional-services) to get a quote and ask our Professional Services experts for a custom analysis of your project. + +## Feedback + +Please send us your questions, feedback and suggestions to improve the service: + +- On the OVHcloud [Discord server](https://discord.gg/ovhcloud). diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md new file mode 100644 index 00000000000..e7ecdcaf38d --- /dev/null +++ b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md @@ -0,0 +1,144 @@ +--- +title: AI Endpoints - Modèles virtuels +excerpt: Découvrez comment utiliser les modèles virtuels d'AI Endpoints +updated: 2025-08-06 +--- + +> [!primary] +> +> AI Endpoints is covered by the **[OVHcloud AI Endpoints Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/48743bf-AI_Endpoints-ALL-1.1.pdf)** and the **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. +> + +## Introduction + +Choosing the right Large Language Model (LLM) is not always straightforward. Models vary in strengths, performance, cost, and licensing, and new ones appear regularly—often outperforming previous options. This rapid evolution makes it essential to match your choice to your specific needs, while staying ready to adapt as better models emerge. + +To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, ...) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. + +## Requirements + +The examples provided during this guide can be used with one of the following environments: + +> [!tabs] +> **Python** +>> +>> A [Python](https://www.python.org/) environment with the [openai client](https://pypi.org/project/openai/). +>> +>> ```sh +>> pip install openai +>> ``` +>> +>> **Curl** +>> +>> A standard terminal, with [curl](https://curl.se/) installed on the system. +>> + +### Authentication & rate limiting + +Most of the examples provided in this guide use anonymous authentication, which makes it simpler to use but may cause rate limiting issues. +If you wish to enable authentication using your own token, simply specify your API key within the requests. + +Follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide for more information on authentication. + +## Model DSL + +When you request a LLM generation through our unified endpoint, you can provide in the OpenAI-compliant `model` field a model DSL query instead of a hardcoded model name. +These queries are divided into three parts: tag, ranker, and condition. +- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, ...) +- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest +- **Condition**: The condition allows you to filter models based on strict requirements on some of the model specifications such as context_size, max_tokens and input_cost. These conditions support basic operators (<, >, =). + +Below are some example queries and the models they currently resolve to. Please note that the resolved model can change, as we continuously update our catalog with new model releases. + +| Model Query | Current Target Model | Usage | +|-----------|-----------|-----------| +| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | +| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | +| mistral@latest?context_size > 100000 | Mistral-Small-3.2-24B-Instruct-2506 | The latest Mistral model with a context window greater than 100k tokens | +| llama@biggest?input_cost<0.5 | Llama-3.1-8B-Instruct | The largest LLaMA model whose input token cost is under €0.50 per 1M tokens | + +You can visit our [catalog](https://endpoints.ai.cloud.ovh.net/catalog) to learn more about the different model specifications. + +## Example Usage + +The following code samples provide a simple example on how to query our API with a model query. + +> [!tabs] +> **Python** +>> ```python +>> import os +>> from openai import OpenAI +>> +>> url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" +>> +>> client = OpenAI( +>> base_url=url, +>> api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN") +>> ) +>> +>> def chat_completion(new_message: str) -> str: +>> history_openai_format = [{"role": "user", "content": new_message}] +>> result = client.chat.completions.create( +>> model="mistral@latest", +>> messages=history_openai_format, +>> temperature=0, +>> max_tokens=1024 +>> ) +>> return result.model, result.choices.pop().message.content +>> +>> if __name__ == '__main__': +>> model, message = chat_completion("Explain gravity for a 6 years old") +>> print(f"Model: {model}:", message) +>> ``` +>> +>> Output: +>> +>> ```sh +>> Model: Mistral-Small-3.2-24B-Instruct-2506: Sure! Here’s a simple way to explain gravity to a 6-year-old: +>> +>> **"Gravity is like a big, invisible hug from the Earth! It’s what keeps us from floating away into space. When you jump, gravity pulls you back down to the ground. It’s also why things fall—like when you drop your toy, it doesn’t float up, it falls down because of gravity. Even the Moon stays close to Earth because of gravity!"** +>> +>> You can also add: +>> - *"If you throw a ball up, gravity pulls it back down."* +>> - *"Without gravity, we’d all be floating like astronauts in space!"* +>> +>> Would you like a fun example or a little experiment to go with it? 😊 +>> ``` +>> +> **Curl** +>> +>> ```sh +>> curl https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions -H "Content-Type: application/json" -d '{ +>> "messages": [{ +>> "role": "user", +>> "content": "What is the capital of France ?" +>> }], "model": "llama@biggest?input_cost<0.5", +>> "seed": 21, "max_tokens": 20 +>> }' +>> ``` +>> +>> Output response: +>> +>> ```sh +>> {"id":"chatcmpl-5df407ebdf23464f891b1b1d8cb3e76d","object":"chat.completion","created":1755007630,"model":"Llama-3.1-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of France is Paris."},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":42,"completion_tokens":8,"total_tokens":50}} +>> ``` +>> + +## Conclusion + +Using OVHcloud AI Endpoints with virtual models allows you to stay up to date with the best available LLMs without having to change your code whenever a new release arrives. +By defining your requirements through tags, rankers, and conditions, you can ensure your application always runs on the most suitable model for your needs—whether you prioritize speed, cost, size, or capabilities. This flexibility makes it easier to build, maintain, and scale AI-powered solutions over time. + +## Go further + +Browse the full [AI Endpoints documentation](/products/public-cloud-ai-and-machine-learning-ai-endpoints) to further understand the main concepts and get started. + +To discover how to build complete and powerful applications using AI Endpoints, explore our dedicated [AI Endpoints guides](/products/public-cloud-ai-and-machine-learning-ai-endpoints). + +If you need training or technical assistance to implement our solutions, contact your sales representative or click on [this link](/links/professional-services) to get a quote and ask our Professional Services experts for a custom analysis of your project. + +## Feedback + +Please send us your questions, feedback and suggestions to improve the service: + +- On the OVHcloud [Discord server](https://discord.gg/ovhcloud). diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/meta.yaml b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/meta.yaml new file mode 100644 index 00000000000..2ebf59a4894 --- /dev/null +++ b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/meta.yaml @@ -0,0 +1,2 @@ +id: 59fdc7b6-933e-4884-9834-c73f007ee512 +full_slug: public-cloud-ai-endpoints-virtual-models \ No newline at end of file From ea022c47d5921976b411c3bedc5cc4c0e4654a76 Mon Sep 17 00:00:00 2001 From: Montrealhub <89825661+Jessica41@users.noreply.github.com> Date: Thu, 14 Aug 2025 15:51:37 -0400 Subject: [PATCH 2/3] proofread guide.en-gb.md --- .../guide.en-gb.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md index 8f15cf18d33..b8cf02ac58d 100644 --- a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md +++ b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.en-gb.md @@ -1,7 +1,7 @@ --- title: AI Endpoints - Using Virtual Models excerpt: Learn how to use OVHcloud AI Endpoints Virtual Models -updated: 2025-08-06 +updated: 2025-08-14 --- > [!primary] @@ -13,7 +13,7 @@ updated: 2025-08-06 Choosing the right Large Language Model (LLM) is not always straightforward. Models vary in strengths, performance, cost, and licensing, and new ones appear regularly—often outperforming previous options. This rapid evolution makes it essential to match your choice to your specific needs, while staying ready to adapt as better models emerge. -To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, ...) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. +To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, etc.) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. ## Requirements @@ -28,7 +28,7 @@ The examples provided during this guide can be used with one of the following en >> pip install openai >> ``` >> ->> **Curl** +> **Curl** >> >> A standard terminal, with [curl](https://curl.se/) installed on the system. >> @@ -43,23 +43,25 @@ Follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cl ## Model DSL When you request a LLM generation through our unified endpoint, you can provide in the OpenAI-compliant `model` field a model DSL query instead of a hardcoded model name. -These queries are divided into three parts: tag, ranker, and condition. -- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, ...) -- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest + +These queries are divided into three parts: tag, ranker, and condition: + +- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, etc.) +- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest. - **Condition**: The condition allows you to filter models based on strict requirements on some of the model specifications such as context_size, max_tokens and input_cost. These conditions support basic operators (<, >, =). Below are some example queries and the models they currently resolve to. Please note that the resolved model can change, as we continuously update our catalog with new model releases. | Model Query | Current Target Model | Usage | |-----------|-----------|-----------| -| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | -| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | +| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | +| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | | mistral@latest?context_size > 100000 | Mistral-Small-3.2-24B-Instruct-2506 | The latest Mistral model with a context window greater than 100k tokens | | llama@biggest?input_cost<0.5 | Llama-3.1-8B-Instruct | The largest LLaMA model whose input token cost is under €0.50 per 1M tokens | You can visit our [catalog](https://endpoints.ai.cloud.ovh.net/catalog) to learn more about the different model specifications. -## Example Usage +### Example Usage The following code samples provide a simple example on how to query our API with a model query. From fb2d171fb094bf76029992f1ecd9e2eb9bcf4cdc Mon Sep 17 00:00:00 2001 From: Montrealhub <89825661+Jessica41@users.noreply.github.com> Date: Thu, 14 Aug 2025 15:53:39 -0400 Subject: [PATCH 3/3] proofread guide.fr-fr.md --- .../guide.fr-fr.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md index e7ecdcaf38d..f0eae408b37 100644 --- a/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md +++ b/pages/public_cloud/ai_machine_learning/endpoints_guide_07_virtual_models/guide.fr-fr.md @@ -1,7 +1,7 @@ --- title: AI Endpoints - Modèles virtuels excerpt: Découvrez comment utiliser les modèles virtuels d'AI Endpoints -updated: 2025-08-06 +updated: 2025-08-14 --- > [!primary] @@ -13,7 +13,7 @@ updated: 2025-08-06 Choosing the right Large Language Model (LLM) is not always straightforward. Models vary in strengths, performance, cost, and licensing, and new ones appear regularly—often outperforming previous options. This rapid evolution makes it essential to match your choice to your specific needs, while staying ready to adapt as better models emerge. -To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, ...) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. +To make this easier, we developed a system of virtual models where instead of requesting a hard-coded model, you specify the expected specifications of the model you need (size, price, etc.) and we resolve it to the currently best matching model in our catalog. In this guide, we'll see the different capabilities of this feature and how to use it with your OpenAI compatible code. ## Requirements @@ -28,7 +28,7 @@ The examples provided during this guide can be used with one of the following en >> pip install openai >> ``` >> ->> **Curl** +> **Curl** >> >> A standard terminal, with [curl](https://curl.se/) installed on the system. >> @@ -43,23 +43,25 @@ Follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cl ## Model DSL When you request a LLM generation through our unified endpoint, you can provide in the OpenAI-compliant `model` field a model DSL query instead of a hardcoded model name. -These queries are divided into three parts: tag, ranker, and condition. -- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, ...) -- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest + +These queries are divided into three parts: tag, ranker, and condition: + +- **Tag**: A tag can be a model series (llama, mistral, codestral, ...), a publisher (meta-llama, mistralai, ...) or use case tag (code_chat, code_completion, summarization, etc.) +- **Ranker**: The ranker defines a model's capability compared to other models. We support multiple rankers such as fastest, cheapest, biggest, latest or smallest. - **Condition**: The condition allows you to filter models based on strict requirements on some of the model specifications such as context_size, max_tokens and input_cost. These conditions support basic operators (<, >, =). Below are some example queries and the models they currently resolve to. Please note that the resolved model can change, as we continuously update our catalog with new model releases. | Model Query | Current Target Model | Usage | |-----------|-----------|-----------| -| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | -| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | +| code_chat@latest | Qwen3-32B | The most recently released model optimized for code chat tasks | +| meta-llama@latest | Llama-3.1-8B-Instruct | The latest Meta-released LLaMA model | | mistral@latest?context_size > 100000 | Mistral-Small-3.2-24B-Instruct-2506 | The latest Mistral model with a context window greater than 100k tokens | | llama@biggest?input_cost<0.5 | Llama-3.1-8B-Instruct | The largest LLaMA model whose input token cost is under €0.50 per 1M tokens | You can visit our [catalog](https://endpoints.ai.cloud.ovh.net/catalog) to learn more about the different model specifications. -## Example Usage +### Example Usage The following code samples provide a simple example on how to query our API with a model query.