You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/generative-apis/faq.mdx
+16-6Lines changed: 16 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@ The exact token count and definition depend on the [tokenizer](https://huggingfa
81
81
You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
82
82
Note that:
83
83
- Cockpits are isolated by Project, hence you first need to select the right Project in the Scaleway console before accessing Cockpit to see your token consumption for this Project (you can see the `project_id` in the Cockpit URL: `https://{project_id}.dashboard.obs.fr-par.scw.cloud/`).
84
-
- Cockpit graphs can take up to 1 hour to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
84
+
- Cockpit graphs can take up to 5 minutes to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
85
85
86
86
### Can I configure a maximum billing threshold?
87
87
Currently, you cannot configure a specific threshold after which your usage will be blocked. However:
@@ -92,7 +92,6 @@ Currently, you cannot configure a specific threshold after which your usage will
92
92
### How can I give access to token consumption to my users outside Scaleway?
93
93
If your users do not have a Scaleway account, you can still give them access to their Generative API usage consumption by either:
94
94
95
-
- Providing them with access to Grafana inside [Cockpit](https://console.scaleway.com/cockpit/overview). You can create dedicated [Grafana users](https://console.scaleway.com/cockpit/users) with read-only access (**Viewer** Role). Note that these users will still have access to all other Cockpit dashboards for this project.
96
95
- Collecting consumption data from the [Billing API](https://www.scaleway.com/en/developers/api/billing/#path-consumption-get-monthly-consumption) and exposing it to your users. Consumption can be detailed by Projects.
97
96
- Collecting consumption data from [Cockpit data sources](https://console.scaleway.com/cockpit/dataSource) and exposing it to your users. As an example, you can query consumption using the following query:
98
97
```curl
@@ -111,15 +110,26 @@ Make sure that you replace the following values:
111
110
You can see your token consumption in [Scaleway Cockpit](https://console.scaleway.com/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
112
111
Note that:
113
112
- Cockpits are isolated by Projects. You first need to select the right Project in the Scaleway console before accessing Cockpit to see your token consumption for the desired Project (you can see the `project_id` in the Cockpit URL: `https://{project_id}.dashboard.obs.fr-par.scw.cloud/`).
114
-
- Cockpit graphs can take up to 1 hour to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
113
+
- Cockpit graphs can take up to 5 minutes to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
115
114
116
115
## Specifications
117
116
118
117
### What are the SLAs applicable to Generative APIs?
119
-
We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon.
118
+
Generative APIs targets a 99.9% monthly availability rate detailed in [Service Level Agreement for Generative APIs](https://www.scaleway.com/en/generative-apis/sla/).
120
119
121
120
### What are the performance guarantees (vs Managed Inference)?
122
-
We are currently working on defining our performance guarantees for Generative APIs. We will provide more information on this topic soon.
121
+
Generative APIs is optimized and monitored to provide reliable performance in most use cases, but does not strictly guarantee performance as it depends on many client-side parameters. We recommend using Managed Inference (dedicated deployment capacity) for applications with critical performance requirements.
122
+
123
+
As an order of magnitude, for Chat models, when performing request with `stream` activated:
124
+
- Time to first token should be less than `1` second for most standard queries (with less than 1000 input tokens)
125
+
- Output token generation speed should be above `100` tokens per second for recent small to medium size models (such as `gpt-oss-120b` or `mistral-small-3.2-24b-instruct-2506`)
126
+
127
+
Exact performance will still vary based mainly on the following factors:
128
+
- Model size and architecture: Smaller and more recent models usually provide better performance.
129
+
- Model type:
130
+
- Chat models' time to first token increases proportionally to the input context size after a certain threshold (usually above `1 000` tokens).
131
+
- Audio transcription models' time to first token remains mostly constant, as they only need to process small numbers of input tokens (`30` seconds audio chunk) to generate a first output.
132
+
- Input and output size: In rough terms, total processing time is proportional to input and output size. However, for larger queries (usually above `10 000` tokens), processing speed may degrade with query size. For optimal performance, we recommend splitting queries into the smallest meaningful parts (`10` queries with `1 000` input tokens and `100` output tokens will be processed faster than `1` query with `10 000` input tokens and `1 000` output tokens).
123
133
124
134
## Quotas and limitations
125
135
@@ -150,4 +160,4 @@ Yes, you need to comply with model licenses when using Generative APIs. Applicab
150
160
## Privacy and security
151
161
152
162
### Where can I find the privacy policy regarding Generative APIs?
153
-
You can find the privacy policy applicable to all use of Generative APIs [here](/generative-apis/reference-content/data-privacy/).
163
+
You can find the privacy policy applicable to all use of Generative APIs [here](/generative-apis/reference-content/data-privacy/).
Copy file name to clipboardExpand all lines: pages/generative-apis/troubleshooting/fixing-common-issues.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,7 +155,7 @@ For queries where the model enters an infinite loop (more frequent when using **
155
155
156
156
### Causes
157
157
- Cockpit is isolated by `project_id` and only displays token consumption related to one Project.
158
-
- Cockpit `Tokens Processed` graphs along time can take up to an hour to update (to provide more accurate average consumptions over time). The overall `Tokens Processed` counter is updated in real-time.
158
+
- Cockpit `Tokens Processed` graphs along time can take up to 5 minutes to update (to provide more accurate average consumptions over time). The overall `Tokens Processed` counter is updated in real-time.
159
159
160
160
### Solution
161
161
- Ensure you are connecting to the Cockpit corresponding to your Project. Cockpits are currently isolated by `project_id`, which you can see in their URL: `https://PROJECT_ID.dashboard.obs.fr-par.scw.cloud/`. This Project should correspond to the one used in the URL you used to perform Generative APIs requests, such as `https://api.scaleway.ai/{PROJECT_ID}/v1/chat/completions`. You can list your projects and their IDs in your [Organization dashboard](https://console.scaleway.com/organization/projects).
0 commit comments