Skip to content

Commit 5a8024c

Browse files
authored
feat(genapi): update faq cockpit and performance
1 parent 7c09b00 commit 5a8024c

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

pages/generative-apis/faq.mdx

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ The exact token count and definition depend on the [tokenizer](https://huggingfa
8181
You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
8282
Note that:
8383
- Cockpits are isolated by Project, hence you first need to select the right Project in the Scaleway console before accessing Cockpit to see your token consumption for this Project (you can see the `project_id` in the Cockpit URL: `https://{project_id}.dashboard.obs.fr-par.scw.cloud/`).
84-
- Cockpit graphs can take up to 1 hour to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
84+
- Cockpit graphs can take up to 5 minutes to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
8585

8686
### Can I configure a maximum billing threshold?
8787
Currently, you cannot configure a specific threshold after which your usage will be blocked. However:
@@ -92,7 +92,6 @@ Currently, you cannot configure a specific threshold after which your usage will
9292
### How can I give access to token consumption to my users outside Scaleway?
9393
If your users do not have a Scaleway account, you can still give them access to their Generative API usage consumption by either:
9494

95-
- Providing them with access to Grafana inside [Cockpit](https://console.scaleway.com/cockpit/overview). You can create dedicated [Grafana users](https://console.scaleway.com/cockpit/users) with read-only access (**Viewer** Role). Note that these users will still have access to all other Cockpit dashboards for this project.
9695
- Collecting consumption data from the [Billing API](https://www.scaleway.com/en/developers/api/billing/#path-consumption-get-monthly-consumption) and exposing it to your users. Consumption can be detailed by Projects.
9796
- Collecting consumption data from [Cockpit data sources](https://console.scaleway.com/cockpit/dataSource) and exposing it to your users. As an example, you can query consumption using the following query:
9897
```curl
@@ -111,15 +110,26 @@ Make sure that you replace the following values:
111110
You can see your token consumption in [Scaleway Cockpit](https://console.scaleway.com/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
112111
Note that:
113112
- Cockpits are isolated by Projects. You first need to select the right Project in the Scaleway console before accessing Cockpit to see your token consumption for the desired Project (you can see the `project_id` in the Cockpit URL: `https://{project_id}.dashboard.obs.fr-par.scw.cloud/`).
114-
- Cockpit graphs can take up to 1 hour to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
113+
- Cockpit graphs can take up to 5 minutes to update token consumption. See [Troubleshooting](/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
115114

116115
## Specifications
117116

118117
### What are the SLAs applicable to Generative APIs?
119-
We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon.
118+
Generative APIs targets a 99.9% monthly availability rate detailed in [Service Level Agreement for Generative APIs](https://www.scaleway.com/en/generative-apis/sla/).
120119

121120
### What are the performance guarantees (vs Managed Inference)?
122-
We are currently working on defining our performance guarantees for Generative APIs. We will provide more information on this topic soon.
121+
Generative APIs is optimized and monitored to provide reliable performance in most use cases but does not strictly guarantee performance as it depends on many client-side parameters. We recommend using Managed Inference (dedicated deployment capacity) for applications with critical performance requirements.
122+
123+
As an order of magnitude, for Chat models, when performing request with `stream` activated:
124+
- time to first token should be less than `1` second for most standard queries (with less than 1000 input tokens)
125+
- output tokens generation speed should be above `100` tokens per second for recent small to medium size models (such as `gpt-oss-120b` or `mistral-small-3.2-24b-instruct-2506`)
126+
127+
Exact performance will still vary based on these main factors:
128+
- Model size and architecture: Smaller and more recent models usually provide better performance.
129+
- Model type:
130+
- Chat models time to first token increase proportionally to the input context size after a certain threshold (usually above `1 000` tokens).
131+
- Audio transcription models time to first token remains mostly constant, as they only need to process small number of input tokens (`30` seconds audio chunk) to generate a first output.
132+
- Input and output size: As a first approximation, total processing time is proportionnal to input and output size. However, for significant size queries (usually above `10 000` tokens), processing speed may degrade with query size. For optimal performance, we recommend splitting queries in the smallest meaningful part (`10` queries with `1 000` input tokens and `100` output tokens will be processed faster than `1` query with `10 000` input tokens and `1 000` output tokens).
123133

124134
## Quotas and limitations
125135

@@ -150,4 +160,4 @@ Yes, you need to comply with model licenses when using Generative APIs. Applicab
150160
## Privacy and security
151161

152162
### Where can I find the privacy policy regarding Generative APIs?
153-
You can find the privacy policy applicable to all use of Generative APIs [here](/generative-apis/reference-content/data-privacy/).
163+
You can find the privacy policy applicable to all use of Generative APIs [here](/generative-apis/reference-content/data-privacy/).

0 commit comments

Comments
 (0)