Update generative-apis.mdx

fpagny · web-flow · commit 36cda4ff44f4 · 2025-02-20T14:42:39.000+01:00
diff --git a/faq/generative-apis.mdx b/faq/generative-apis.mdx
@@ -15,11 +15,19 @@ Scaleway's Generative APIs provide access to pre-configured, serverless endpoint
 
 ## Which models are supported by Generative APIs?
 Our Generative APIs support a range of popular models, including:
-- Instruct models: Refer to our dedicated [documentation](/generative-apis/reference-content/supported-models/#chat-models) for a list of supported chat models.
+- Chat / Text Generation models: Refer to our dedicated [documentation](/generative-apis/reference-content/supported-models/#chat-models) for a list of supported chat models.
+- Vision models: Refer to our dedicated [documentation](/generative-apis/reference-content/supported-models/#vision-models) for a list of supported vision models.
 - Embedding models: Refer to our dedicated [documentation](/generative-apis/reference-content/supported-models/#embedding-models) for a list of supported embedding models.
 
 ## How does the free tier work?
-The free tier allows you to process up to 1,000,000 tokens without incurring any costs. After reaching this limit, you will be charged per million tokens processed. For more information, refer to our [pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#generative-apis).
+The free tier allows you to process up to 1,000,000 tokens without incurring any costs. After reaching this limit, you will be charged per million tokens processed. Free tier usage is calculated by adding all input and output tokens consumed from all models used.
+For more information, refer to our [pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#generative-apis).
+
+## How can I monitor my token consumption?
+You can see your token consumption in Scaleway Cockpit. You can access it from Scaleway Console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
+Note that:
+- Cockpits are isolated by Projects, hence you first need to select the right project in Scaleway Console before accessing Cockpit to see your token consumption for this project (you can see the project_id in Cockpit URL: `https://{project_id}.dashboard.obs.fr-par.scw.cloud/`.
+- Cockpit graphs can take up to 1 hour to update token consumption, see [Troubleshooting](https://www.scaleway.com/en/docs/generative-apis/troubleshooting/fixing-common-issues/#tokens-consumption-is-not-displayed-in-cockpit-metrics) for further details.
 
 ## How can I access and use the Generative APIs?
 Access is open to all Scaleway customers. You can start by using the Generative APIs Playground in the Scaleway console to experiment with different models. For integration into applications, you can use the OpenAI-compatible APIs provided by Scaleway. Detailed instructions are available in our [Quickstart guide](/generative-apis/quickstart/).
@@ -34,14 +42,15 @@ You can find the privacy policy applicable to all use of Generative APIs [here](
 Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows.
 
 ## What is the difference between Generative APIs and Managed Inference?
-- **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage.
-- **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control.
+- **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage. 
+- **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control. Managed Inference is billed by hourly usage, whether provisioned capacity is receiving traffic or not.
 
 ## How do I get started with Generative APIs?
 To get started, explore the [Generative APIs Playground](/generative-apis/quickstart/#start-with-the-generative-apis-playground) in the Scaleway console. For application integration, refer to our [Quickstart guide](/generative-apis/quickstart/), which provides step-by-step instructions on accessing, configuring, and using a Generative APIs endpoint.
 
 ## Are there any rate limits for API usage?
-Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation. Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits.
+Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits (by a factor from 2 to 5 times), you can request them by [creating a ticket](https://console.scaleway.com/support/tickets/create). If you require even higher rate limits, especially to absorb infrequent peak loads, we recommend to use [Managed Inference](https://console.scaleway.com/inference/deployments) instead with dedicated provisioned capacity.
+Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits.
 
 ## What is the model lifecycle for Generative APIs?
 Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety. As new versions of models are introduced, you can explore them through the Scaleway console. Learn more in our dedicated [documentation](/generative-apis/reference-content/model-lifecycle/).
@@ -50,4 +59,7 @@ Scaleway is dedicated to updating and offering the latest versions of generative
 We are currently working on defining our SLAs for Generative APIs. We will provide more information on this topic soon.
 
 ## What are the Performance guarantees (vs Managed Inference)?
-We are currently working on defining our performance guarantees for Generative APIs. We will provide more information on this topic soon.
+We are currently working on defining our performance guarantees for Generative APIs. We will provide more information on this topic soon.
+
+## Do model license apply when using Generative APIs?
+Yes, you need to comply with model licenses when using Generative APIs. Applicable licenses are available for [each model in our documentation](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/#vision-models) and in Console Playground.