scaleway
diff --git a/‎pages/ai-data/generative-apis/api-cli/index.mdx‎
Lines changed: 8 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/index.mdx‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/api-cli/understanding-errors.mdx‎
Lines changed: 39 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/understanding-errors.mdx‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/api-cli/using-chat-api.mdx‎
Lines changed: 93 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/using-chat-api.mdx‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/api-cli/using-embeddings-api.mdx‎
Lines changed: 57 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/using-embeddings-api.mdx‎
Lines changed: 57 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/api-cli/using-generative-apis.mdx‎
Lines changed: 79 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/using-generative-apis.mdx‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/api-cli/using-models-api.mdx‎
Lines changed: 27 additions & 0 deletions b/‎pages/ai-data/generative-apis/api-cli/using-models-api.mdx‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎pages/ai-data/generative-apis/concepts.mdx‎
Lines changed: 81 additions & 0 deletions b/‎pages/ai-data/generative-apis/concepts.mdx‎
Lines changed: 81 additions & 0 deletions
@@ -0,0 +1,8 @@
+---
+meta:
+  title: Generative APIs - API/CLI
+  description: Generative APIs API/CLI
+content:
+  h1: Generative APIs - API/CLI
+  paragraph: Generative APIs API/CLI
+---
@@ -0,0 +1,39 @@
+---
+meta:
+  title: Understanding errors
+  description: This page explains how to understand errors with Generative APIs
+content:
+  h1: Understanding errors
+  paragraph: This page explains how to understand errors with Generative APIs
+tags: generative-apis ai-data understanding-data
+dates:
+  validation: 2024-10-31
+  posted: 2024-09-02
+---
+
+Scaleway uses conventional HTTP response codes to indicate the success or failure of an API request. 
+In general, codes in the 2xx range indicate success, codes in the 4xx range indicate an error caused by the information provided, and codes in the 5xx range show an error from Scaleway servers.
+
+If the response code is not within the 2xx range, the response will contain an error object structured as follows:
+
+```
+{
+  "error": string,
+  "status": number,
+  "message": string
+}
+```
+
+Below are usual HTTP error codes:
+
+- 400 - **Bad Request**: The format or content of your payload is incorrect. The body may be too large, or fail to parse, or the content-type is mismatched.
+- 401 - **Unauthorized**: The `authorization` header is missing. Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/)
+- 403 - **Forbidden**: Your API key does not exist or does not have the necessary permissions to access the requested resource. Find required permission sets in [this page](/generative-apis/api-cli/using-generative-apis/)
+- 404 - **Route Not Found**: The requested resource could not be found. Check your request is being made to the correct endpoint.
+- 422 - **Model Not Found**: The `model` key is present in the request payload, but the corresponding model is not found.
+- 422 - **Missing Model**:  The `model` key is missing from the request payload.
+- 429 - **Too Many Requests**: You are exceeding your current quota for the requested model, calculated in requests per minute. Find rate limits on [this page](/generative-apis/reference-content/rate-limits/)
+- 429 - **Too Many Tokens**: You are exceeding your current quota for the requested model, calculated in tokens per minute. Find rate limits on [this page](/generative-apis/reference-content/rate-limits/)
+- 500 - **API error**: An unexpected internal error has occurred within Scaleway's systems. If the issue persists, please [open a support ticket](https://console.scaleway.com/support/tickets/create).
+
+For streaming responses via SSE, 5xx errors may occur after a 200 response has been returned.
@@ -0,0 +1,93 @@
+---
+meta:
+  title: Using Chat API
+  description: This page explains how to use the Chat API to query models
+content:
+  h1: Using Chat API
+  paragraph: This page explains how to use the Chat API to query models
+tags: generative-apis ai-data chat-api
+dates:
+  validation: 2024-09-03
+  posted: 2024-09-03
+---
+
+Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have an LLM-driven application that uses one of OpenAI's client libraries, you can easily configure it to point to Scaleway Chat API, and get your existing applications running using open-weight instruct models hosted at Scaleway.
+
+## Create chat completion
+
+Creates a model response for the given chat conversation.
+
+**Request sample:**
+
+```
+curl --request POST \
+     --url https://api.scaleway.ai/v1/chat/completions \
+     --header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
+     --header 'Content-Type: application/json' \
+     --data '{
+     "model": "llama-3.1-8b-instruct",
+     "messages": [
+      {
+        "role": "system",
+        "content": "<string>"
+      },
+      {
+        "role": "user",
+        "content": "<string>"
+      }
+     ],
+     "max_tokens": integer,
+     "temperature": float,
+     "top_p": float,
+     "presence_penalty": float,
+     "stop": "<string>",
+     "stream": boolean,
+     }'
+```
+
+
+## Headers
+
+Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/).
+
+## Body
+
+### Required parameters
+
+| Param  | Type | Description |
+| ------------- |-------------|-------------|
+| **messages**     | array of objects     | A list of messages comprising the conversation so far.     |
+| **model**      | string     | The name of the model to query.     |
+
+Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/chat/create) for more detailed information on the usage.
+
+### Supported parameters
+
+- temperature
+- top_p
+- max_tokens
+- stream
+- stream_options
+- presence_penalty
+- [response_format](/generative-apis/how-to/use-structured-outputs)
+- logprobs
+- stop
+- seed
+- [tools](/generative-apis/how-to/use-function-calling)
+- [tool_choice](/generative-apis/how-to/use-function-calling)
+
+### Unsupported parameters
+
+- frequency_penalty
+- n
+- top_logprobs
+- logit_bias
+- user
+
+If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/) on #ai channel.
+
+## Going further
+
+1. [Python code examples](/generative-apis/how-to/query-language-models/#querying-language-models-via-api) to query text models using Scaleway's Chat API
+2. [How to use structured outputs](/generative-apis/how-to/use-structured-outputs) with the `response_format` parameter
+3. [How to use function calling](/generative-apis/how-to/use-function-calling) with `tools` and `tool_choice`
@@ -0,0 +1,57 @@
+---
+meta:
+  title: Using Embeddings API
+  description: This page explains how to use the Embeddings API
+content:
+  h1: Using Embeddings API
+  paragraph: This page explains how to use the Embeddings API
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-09-03
+  posted: 2024-09-03
+---
+
+Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have clustering or classification tasks already using one of OpenAI's client libraries, you can easily configure it to point to Scaleway Embeddings API, and get your existing applications running with open-weight embedding models hosted at Scaleway.
+
+## Create embeddings
+
+Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.
+
+**Request sample:**
+
+```
+curl --request POST \
+     --url https://api.scaleway.ai/v1/embeddings \
+     --header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
+     --header 'Content-Type: application/json'
+     --data '{
+     "model": "sentence-t5-xxl",
+     "input": "<string>"
+     }'
+```
+
+## Headers
+
+Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/).
+
+## Body
+
+### Required parameters
+
+| Param  | Type | Description |
+| ------------- |-------------|-------------|
+| **input**      | string or array     | Input text to embed, encoded as a string or array of strings. It cannot be an empty string.  |
+| **model**       | string     | The name of the model to query.     |
+
+Our embeddings API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/embeddings) for more detailed information on the usage.
+
+### Unsupported parameters
+
+- encoding_format (default float)
+- dimensions
+
+If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/) on #ai channel.
+
+<Message type="note">
+  Check our [Python code examples](/generative-apis/how-to/query-embedding-models/#querying-embedding-models-via-api) to query embedding models using Scaleway's Embeddings API.
+</Message>
@@ -0,0 +1,79 @@
+---
+meta:
+  title: Using Generative APIs
+  description: This page explains how to use Generative APIs
+content:
+  h1: Using Generative APIs
+  paragraph: This page explains how to use Generative APIs
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-08-28
+  posted: 2024-08-28
+---
+
+## Access
+
+- A valid [API key](/iam/how-to/create-api-keys/) is needed.
+
+## Authentication
+
+All requests to the Scaleway Generative APIs must include an `Authorization` HTTP header with your API key prefixed by `Bearer`.
+
+We recommend exporting your secret key as an environment variable, which you can then pass directly in your curl request as follows. Remember to replace the example value with *your own API secret key*.
+
+```
+export SCW_SECRET_KEY=720438f9-fcb9-4ebb-80a7-808ebf15314b
+```
+
+Run the following curl request once you have exported your environment variable:
+
+```
+curl -X GET \
+    -H "Authorization: Bearer ${SCW_SECRET_KEY}" \
+    "https://api.scaleway.ai/v1/models"
+```
+
+When using the OpenAI Python SDK, the API key is set once during client initialization, and the SDK automatically manages the inclusion of the Authorization header in all API requests. 
+In contrast, when directly integrating with the Scaleway Generative APIs, you are responsible for manually setting the Authorization header with the API key for each request to ensure proper authentication.
+
+## Content types
+
+Scaleway Generative APIs accept JSON in request bodies and returns JSON in response bodies. 
+You will want to send the `Content-Type: application/json` HTTP header in your POST requests.
+
+```
+curl --request POST \
+     --url https://api.scaleway.ai/v1/chat/completions \
+     --header "Authorization: Bearer ${SCW_SECRET_KEY}" \
+     --header "Content-Type: application/json" \
+     --data '{}'
+```
+
+## Permissions
+
+Permissions define the actions a user or an application can perform on Scaleway Generative APIs. They are managed using Scaleway’s [Identity and Access Management](/iam/quickstart/) interface.
+
+[Owner](/iam/concepts/#owner) status or certain [IAM permissions](/iam/concepts/#permission) allow you to perform actions in the intended Organization.
+
+Querying AI models hosted by Scaleway Generative APIs will require any of the following [permission sets](/iam/concepts/#permission-set):
+
+- **GenerativeApisModelAccess**
+- **GenerativeApisFullAccess**
+- **AllProductsFullAccess**
+
+## Projects
+
+You can scope your Generative APIs consumption to a [Project](identity-and-access-management/iam/concepts/#project). This is helpful to restrict IAM users’ access to only the Project they are working on, or to isolate your bills between Projects.
+
+1. Find your Project ID in your [Project settings](https://console.scaleway.com/project/settings)
+2. Insert your Project ID in the Generative APIs service URL, for example:
+
+```
+https://api.scaleway.ai/78e655b5-feb0-417c-bb3f-8c448bd0e8da/v1
+```
+
+The Project ID is hidden for the default Project.
+
+
+
+
@@ -0,0 +1,27 @@
+---
+meta:
+  title: Using Models API
+  description: This page explains how to use the Models API
+content:
+  h1: Using Models API
+  paragraph: This page explains how to use the Models API
+tags: generative-apis ai-data embeddings-api
+dates:
+  validation: 2024-09-02
+  posted: 2024-09-02
+---
+
+Scaleway Generative APIs are designed as drop-in replacement for the OpenAI APIs.
+The Models API allows you to easily list the various AI models available at Scaleway.
+
+## List models
+
+Lists the available models, and provides basic information about each one.
+
+**Request sample:**
+
+```
+curl -s \
+     --url "https://api.scaleway.ai/v1/models" \
+     --header "Authorization: Bearer ${SCW_SECRET_KEY}"
+```
@@ -0,0 +1,81 @@
+---
+meta:
+  title: Generative APIs - Concepts
+  description: This page explains all the concepts related to Generative APIs
+content:
+  h1: Generative APIs - Concepts
+  paragraph: This page explains all the concepts related to Generative APIs
+tags:
+dates:
+  validation: 2024-08-27
+categories:
+  - ai-data
+---
+
+## API rate limits
+
+API rate limits define the maximum number of requests a user can make to the Generative APIs within a specific time frame. Rate limiting helps to manage resource allocation, prevent abuse, and ensure fair access for all users. Understanding and adhering to these limits is essential for maintaining optimal application performance using these APIs.
+
+## Context window
+
+A context window is the maximum amount of prompt data considered by the model to generate a response. Using models with high context length, you can provide more information to generate relevant responses. The context is measured in tokens.
+
+## Function calling
+
+Function calling allows a large language model (LLM) to interact with external tools or APIs, executing specific tasks based on user requests. The LLM identifies the appropriate function, extracts the required parameters, and returns the results as structured data, typically in JSON format.
+
+## Embeddings
+
+Embeddings are numerical representations of text data that capture semantic information in a dense vector format. In Generative APIs, embeddings are essential for tasks such as similarity matching, clustering, and serving as inputs for downstream models. These vectors enable the model to understand and generate text based on the underlying meaning rather than just the surface-level words.
+
+## Error handling
+
+Error handling refers to the strategies and mechanisms in place to manage and respond to errors during API requests. This includes handling network issues, invalid inputs, or server-side errors. Proper error handling ensures that applications using Generative APIs can gracefully recover from failures and provide meaningful feedback to users.
+
+## Parameters
+
+Parameters are settings that control the behavior and performance of generative models. These include temperature, max tokens, and top-p sampling, among others. Adjusting parameters allows users to tweak the model's output, balancing factors like creativity, accuracy, and response length to suit specific use cases.
+
+## Inter-token Latency (ITL)
+
+The inter-token latency (ITL) corresponds to the average time elapsed between two generated tokens. It is usually expressed in milliseconds.
+
+## JSON mode
+
+JSON mode allows you to guide the language model in outputting well-structured JSON data.
+To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
+JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
+
+## Prompt Engineering
+
+Prompt engineering involves crafting specific and well-structured inputs (prompts) to guide the model towards generating the desired output. Effective prompt design is crucial for generating relevant responses, particularly in complex or creative tasks. It often requires experimentation to find the right balance between specificity and flexibility.
+
+## Retrieval Augmented Generation (RAG)
+
+Retrieval Augmented Generation (RAG) is a technique that enhances generative models by integrating information retrieval methods. By fetching relevant data from external sources before generating a response, RAG ensures that the output is more accurate and contextually relevant, especially in scenarios requiring up-to-date or specific information.
+
+## Stop words
+
+Stop words are a parameter set to tell the model to stop generating further tokens after one or more chosen tokens have been generated. This is useful for controlling the end of the model output, as it will cut off at the first occurrence of any of these strings.
+
+## Streaming
+
+Streaming is a parameter allowing responses to be delivered in real-time, showing parts of the output as they are generated rather than waiting for the full response. Scaleway is following the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard. This behavior usually enhances user experience by providing immediate feedback and a more interactive conversation.
+
+## Structured outputs
+
+Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
+By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
+By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
+
+## Temperature
+
+Temperature is a parameter that controls the randomness of the model's output during text generation. A higher temperature produces more creative and diverse outputs, while a lower temperature makes the model's responses more deterministic and focused. Adjusting the temperature allows users to balance creativity with coherence in the generated text.
+
+## Time to First Token (TTFT)
+
+Time to First Token (TTFT) measures the time elapsed from the moment a request is made to the point when the first token of the generated text is returned. TTFT is a crucial performance metric for evaluating the responsiveness of generative models, especially in interactive applications where users expect immediate feedback.
+
+## Tokens
+
+Tokens are the basic units of text that a generative model processes. Depending on the tokenization strategy, these can be words, subwords, or even characters. The number of tokens directly affects the context window size and the computational cost of using the model. Understanding token usage is essential for optimizing API requests and managing costs effectively.