Skip to content

Commit 102bf43

Browse files
Merge branch 'fb-drop-categories' into move-into-pages
2 parents fc3c292 + 98f8282 commit 102bf43

File tree

1,697 files changed

+74300
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,697 files changed

+74300
-3
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
meta:
3+
title: Generative APIs - API/CLI
4+
description: Generative APIs API/CLI
5+
content:
6+
h1: Generative APIs - API/CLI
7+
paragraph: Generative APIs API/CLI
8+
---
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
meta:
3+
title: Understanding errors
4+
description: This page explains how to understand errors with Generative APIs
5+
content:
6+
h1: Understanding errors
7+
paragraph: This page explains how to understand errors with Generative APIs
8+
tags: generative-apis ai-data understanding-data
9+
dates:
10+
validation: 2024-10-31
11+
posted: 2024-09-02
12+
---
13+
14+
Scaleway uses conventional HTTP response codes to indicate the success or failure of an API request.
15+
In general, codes in the 2xx range indicate success, codes in the 4xx range indicate an error caused by the information provided, and codes in the 5xx range show an error from Scaleway servers.
16+
17+
If the response code is not within the 2xx range, the response will contain an error object structured as follows:
18+
19+
```
20+
{
21+
"error": string,
22+
"status": number,
23+
"message": string
24+
}
25+
```
26+
27+
Below are usual HTTP error codes:
28+
29+
- 400 - **Bad Request**: The format or content of your payload is incorrect. The body may be too large, or fail to parse, or the content-type is mismatched.
30+
- 401 - **Unauthorized**: The `authorization` header is missing. Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/)
31+
- 403 - **Forbidden**: Your API key does not exist or does not have the necessary permissions to access the requested resource. Find required permission sets in [this page](/generative-apis/api-cli/using-generative-apis/)
32+
- 404 - **Route Not Found**: The requested resource could not be found. Check your request is being made to the correct endpoint.
33+
- 422 - **Model Not Found**: The `model` key is present in the request payload, but the corresponding model is not found.
34+
- 422 - **Missing Model**: The `model` key is missing from the request payload.
35+
- 429 - **Too Many Requests**: You are exceeding your current quota for the requested model, calculated in requests per minute. Find rate limits on [this page](/generative-apis/reference-content/rate-limits/)
36+
- 429 - **Too Many Tokens**: You are exceeding your current quota for the requested model, calculated in tokens per minute. Find rate limits on [this page](/generative-apis/reference-content/rate-limits/)
37+
- 500 - **API error**: An unexpected internal error has occurred within Scaleway's systems. If the issue persists, please [open a support ticket](https://console.scaleway.com/support/tickets/create).
38+
39+
For streaming responses via SSE, 5xx errors may occur after a 200 response has been returned.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
meta:
3+
title: Using Chat API
4+
description: This page explains how to use the Chat API to query models
5+
content:
6+
h1: Using Chat API
7+
paragraph: This page explains how to use the Chat API to query models
8+
tags: generative-apis ai-data chat-api
9+
dates:
10+
validation: 2024-09-03
11+
posted: 2024-09-03
12+
---
13+
14+
Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have an LLM-driven application that uses one of OpenAI's client libraries, you can easily configure it to point to Scaleway Chat API, and get your existing applications running using open-weight instruct models hosted at Scaleway.
15+
16+
## Create chat completion
17+
18+
Creates a model response for the given chat conversation.
19+
20+
**Request sample:**
21+
22+
```
23+
curl --request POST \
24+
--url https://api.scaleway.ai/v1/chat/completions \
25+
--header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
26+
--header 'Content-Type: application/json' \
27+
--data '{
28+
"model": "llama-3.1-8b-instruct",
29+
"messages": [
30+
{
31+
"role": "system",
32+
"content": "<string>"
33+
},
34+
{
35+
"role": "user",
36+
"content": "<string>"
37+
}
38+
],
39+
"max_tokens": integer,
40+
"temperature": float,
41+
"top_p": float,
42+
"presence_penalty": float,
43+
"stop": "<string>",
44+
"stream": boolean,
45+
}'
46+
```
47+
48+
49+
## Headers
50+
51+
Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/).
52+
53+
## Body
54+
55+
### Required parameters
56+
57+
| Param | Type | Description |
58+
| ------------- |-------------|-------------|
59+
| **messages** | array of objects | A list of messages comprising the conversation so far. |
60+
| **model** | string | The name of the model to query. |
61+
62+
Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/chat/create) for more detailed information on the usage.
63+
64+
### Supported parameters
65+
66+
- temperature
67+
- top_p
68+
- max_tokens
69+
- stream
70+
- stream_options
71+
- presence_penalty
72+
- [response_format](/generative-apis/how-to/use-structured-outputs)
73+
- logprobs
74+
- stop
75+
- seed
76+
- [tools](/generative-apis/how-to/use-function-calling)
77+
- [tool_choice](/generative-apis/how-to/use-function-calling)
78+
79+
### Unsupported parameters
80+
81+
- frequency_penalty
82+
- n
83+
- top_logprobs
84+
- logit_bias
85+
- user
86+
87+
If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/) on #ai channel.
88+
89+
## Going further
90+
91+
1. [Python code examples](/generative-apis/how-to/query-language-models/#querying-language-models-via-api) to query text models using Scaleway's Chat API
92+
2. [How to use structured outputs](/generative-apis/how-to/use-structured-outputs) with the `response_format` parameter
93+
3. [How to use function calling](/generative-apis/how-to/use-function-calling) with `tools` and `tool_choice`
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
meta:
3+
title: Using Embeddings API
4+
description: This page explains how to use the Embeddings API
5+
content:
6+
h1: Using Embeddings API
7+
paragraph: This page explains how to use the Embeddings API
8+
tags: generative-apis ai-data embeddings-api
9+
dates:
10+
validation: 2024-09-03
11+
posted: 2024-09-03
12+
---
13+
14+
Scaleway Generative APIs are designed as a drop-in replacement for the OpenAI APIs. If you have clustering or classification tasks already using one of OpenAI's client libraries, you can easily configure it to point to Scaleway Embeddings API, and get your existing applications running with open-weight embedding models hosted at Scaleway.
15+
16+
## Create embeddings
17+
18+
Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.
19+
20+
**Request sample:**
21+
22+
```
23+
curl --request POST \
24+
--url https://api.scaleway.ai/v1/embeddings \
25+
--header 'Authorization: Bearer ${SCW_SECRET_KEY}' \
26+
--header 'Content-Type: application/json'
27+
--data '{
28+
"model": "sentence-t5-xxl",
29+
"input": "<string>"
30+
}'
31+
```
32+
33+
## Headers
34+
35+
Find required headers in [this page](/generative-apis/api-cli/using-generative-apis/).
36+
37+
## Body
38+
39+
### Required parameters
40+
41+
| Param | Type | Description |
42+
| ------------- |-------------|-------------|
43+
| **input** | string or array | Input text to embed, encoded as a string or array of strings. It cannot be an empty string. |
44+
| **model** | string | The name of the model to query. |
45+
46+
Our embeddings API is OpenAI compatible. Use OpenAI’s [API reference](https://platform.openai.com/docs/api-reference/embeddings) for more detailed information on the usage.
47+
48+
### Unsupported parameters
49+
50+
- encoding_format (default float)
51+
- dimensions
52+
53+
If you have a use case requiring one of these unsupported parameters, please [contact us via Slack](https://slack.scaleway.com/) on #ai channel.
54+
55+
<Message type="note">
56+
Check our [Python code examples](/generative-apis/how-to/query-embedding-models/#querying-embedding-models-via-api) to query embedding models using Scaleway's Embeddings API.
57+
</Message>
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
meta:
3+
title: Using Generative APIs
4+
description: This page explains how to use Generative APIs
5+
content:
6+
h1: Using Generative APIs
7+
paragraph: This page explains how to use Generative APIs
8+
tags: generative-apis ai-data embeddings-api
9+
dates:
10+
validation: 2024-08-28
11+
posted: 2024-08-28
12+
---
13+
14+
## Access
15+
16+
- A valid [API key](/iam/how-to/create-api-keys/) is needed.
17+
18+
## Authentication
19+
20+
All requests to the Scaleway Generative APIs must include an `Authorization` HTTP header with your API key prefixed by `Bearer`.
21+
22+
We recommend exporting your secret key as an environment variable, which you can then pass directly in your curl request as follows. Remember to replace the example value with *your own API secret key*.
23+
24+
```
25+
export SCW_SECRET_KEY=720438f9-fcb9-4ebb-80a7-808ebf15314b
26+
```
27+
28+
Run the following curl request once you have exported your environment variable:
29+
30+
```
31+
curl -X GET \
32+
-H "Authorization: Bearer ${SCW_SECRET_KEY}" \
33+
"https://api.scaleway.ai/v1/models"
34+
```
35+
36+
When using the OpenAI Python SDK, the API key is set once during client initialization, and the SDK automatically manages the inclusion of the Authorization header in all API requests.
37+
In contrast, when directly integrating with the Scaleway Generative APIs, you are responsible for manually setting the Authorization header with the API key for each request to ensure proper authentication.
38+
39+
## Content types
40+
41+
Scaleway Generative APIs accept JSON in request bodies and returns JSON in response bodies.
42+
You will want to send the `Content-Type: application/json` HTTP header in your POST requests.
43+
44+
```
45+
curl --request POST \
46+
--url https://api.scaleway.ai/v1/chat/completions \
47+
--header "Authorization: Bearer ${SCW_SECRET_KEY}" \
48+
--header "Content-Type: application/json" \
49+
--data '{}'
50+
```
51+
52+
## Permissions
53+
54+
Permissions define the actions a user or an application can perform on Scaleway Generative APIs. They are managed using Scaleway’s [Identity and Access Management](/iam/quickstart/) interface.
55+
56+
[Owner](/iam/concepts/#owner) status or certain [IAM permissions](/iam/concepts/#permission) allow you to perform actions in the intended Organization.
57+
58+
Querying AI models hosted by Scaleway Generative APIs will require any of the following [permission sets](/iam/concepts/#permission-set):
59+
60+
- **GenerativeApisModelAccess**
61+
- **GenerativeApisFullAccess**
62+
- **AllProductsFullAccess**
63+
64+
## Projects
65+
66+
You can scope your Generative APIs consumption to a [Project](identity-and-access-management/iam/concepts/#project). This is helpful to restrict IAM users’ access to only the Project they are working on, or to isolate your bills between Projects.
67+
68+
1. Find your Project ID in your [Project settings](https://console.scaleway.com/project/settings)
69+
2. Insert your Project ID in the Generative APIs service URL, for example:
70+
71+
```
72+
https://api.scaleway.ai/78e655b5-feb0-417c-bb3f-8c448bd0e8da/v1
73+
```
74+
75+
The Project ID is hidden for the default Project.
76+
77+
78+
79+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
meta:
3+
title: Using Models API
4+
description: This page explains how to use the Models API
5+
content:
6+
h1: Using Models API
7+
paragraph: This page explains how to use the Models API
8+
tags: generative-apis ai-data embeddings-api
9+
dates:
10+
validation: 2024-09-02
11+
posted: 2024-09-02
12+
---
13+
14+
Scaleway Generative APIs are designed as drop-in replacement for the OpenAI APIs.
15+
The Models API allows you to easily list the various AI models available at Scaleway.
16+
17+
## List models
18+
19+
Lists the available models, and provides basic information about each one.
20+
21+
**Request sample:**
22+
23+
```
24+
curl -s \
25+
--url "https://api.scaleway.ai/v1/models" \
26+
--header "Authorization: Bearer ${SCW_SECRET_KEY}"
27+
```
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
meta:
3+
title: Generative APIs - Concepts
4+
description: This page explains all the concepts related to Generative APIs
5+
content:
6+
h1: Generative APIs - Concepts
7+
paragraph: This page explains all the concepts related to Generative APIs
8+
tags:
9+
dates:
10+
validation: 2024-08-27
11+
categories:
12+
- ai-data
13+
---
14+
15+
## API rate limits
16+
17+
API rate limits define the maximum number of requests a user can make to the Generative APIs within a specific time frame. Rate limiting helps to manage resource allocation, prevent abuse, and ensure fair access for all users. Understanding and adhering to these limits is essential for maintaining optimal application performance using these APIs.
18+
19+
## Context window
20+
21+
A context window is the maximum amount of prompt data considered by the model to generate a response. Using models with high context length, you can provide more information to generate relevant responses. The context is measured in tokens.
22+
23+
## Function calling
24+
25+
Function calling allows a large language model (LLM) to interact with external tools or APIs, executing specific tasks based on user requests. The LLM identifies the appropriate function, extracts the required parameters, and returns the results as structured data, typically in JSON format.
26+
27+
## Embeddings
28+
29+
Embeddings are numerical representations of text data that capture semantic information in a dense vector format. In Generative APIs, embeddings are essential for tasks such as similarity matching, clustering, and serving as inputs for downstream models. These vectors enable the model to understand and generate text based on the underlying meaning rather than just the surface-level words.
30+
31+
## Error handling
32+
33+
Error handling refers to the strategies and mechanisms in place to manage and respond to errors during API requests. This includes handling network issues, invalid inputs, or server-side errors. Proper error handling ensures that applications using Generative APIs can gracefully recover from failures and provide meaningful feedback to users.
34+
35+
## Parameters
36+
37+
Parameters are settings that control the behavior and performance of generative models. These include temperature, max tokens, and top-p sampling, among others. Adjusting parameters allows users to tweak the model's output, balancing factors like creativity, accuracy, and response length to suit specific use cases.
38+
39+
## Inter-token Latency (ITL)
40+
41+
The inter-token latency (ITL) corresponds to the average time elapsed between two generated tokens. It is usually expressed in milliseconds.
42+
43+
## JSON mode
44+
45+
JSON mode allows you to guide the language model in outputting well-structured JSON data.
46+
To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
47+
JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
48+
49+
## Prompt Engineering
50+
51+
Prompt engineering involves crafting specific and well-structured inputs (prompts) to guide the model towards generating the desired output. Effective prompt design is crucial for generating relevant responses, particularly in complex or creative tasks. It often requires experimentation to find the right balance between specificity and flexibility.
52+
53+
## Retrieval Augmented Generation (RAG)
54+
55+
Retrieval Augmented Generation (RAG) is a technique that enhances generative models by integrating information retrieval methods. By fetching relevant data from external sources before generating a response, RAG ensures that the output is more accurate and contextually relevant, especially in scenarios requiring up-to-date or specific information.
56+
57+
## Stop words
58+
59+
Stop words are a parameter set to tell the model to stop generating further tokens after one or more chosen tokens have been generated. This is useful for controlling the end of the model output, as it will cut off at the first occurrence of any of these strings.
60+
61+
## Streaming
62+
63+
Streaming is a parameter allowing responses to be delivered in real-time, showing parts of the output as they are generated rather than waiting for the full response. Scaleway is following the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard. This behavior usually enhances user experience by providing immediate feedback and a more interactive conversation.
64+
65+
## Structured outputs
66+
67+
Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
68+
By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
69+
By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
70+
71+
## Temperature
72+
73+
Temperature is a parameter that controls the randomness of the model's output during text generation. A higher temperature produces more creative and diverse outputs, while a lower temperature makes the model's responses more deterministic and focused. Adjusting the temperature allows users to balance creativity with coherence in the generated text.
74+
75+
## Time to First Token (TTFT)
76+
77+
Time to First Token (TTFT) measures the time elapsed from the moment a request is made to the point when the first token of the generated text is returned. TTFT is a crucial performance metric for evaluating the responsiveness of generative models, especially in interactive applications where users expect immediate feedback.
78+
79+
## Tokens
80+
81+
Tokens are the basic units of text that a generative model processes. Depending on the tokenization strategy, these can be words, subwords, or even characters. The number of tokens directly affects the context window size and the computational cost of using the model. Understanding token usage is essential for optimizing API requests and managing costs effectively.

0 commit comments

Comments
 (0)