docs(add): rowena review

nerda-codes · RoRoJ · web-flow · commit 5146ea4cfc3e · 2025-11-10T10:05:00.000+01:00
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
@@ -118,18 +118,18 @@ Note that:
 Generative APIs targets a 99.9% monthly availability rate detailed in [Service Level Agreement for Generative APIs](https://www.scaleway.com/en/generative-apis/sla/).
 
 ### What are the performance guarantees (vs Managed Inference)?
-Generative APIs is optimized and monitored to provide reliable performance in most use cases but does not strictly guarantee performance as it depends on many client-side parameters. We recommend using Managed Inference (dedicated deployment capacity) for applications with critical performance requirements.
+Generative APIs is optimized and monitored to provide reliable performance in most use cases, but does not strictly guarantee performance as it depends on many client-side parameters. We recommend using Managed Inference (dedicated deployment capacity) for applications with critical performance requirements.
 
 As an order of magnitude, for Chat models, when performing request with `stream` activated: 
-- time to first token should be less than `1` second for most standard queries (with less than 1000 input tokens)
-- output tokens generation speed should be above `100` tokens per second for recent small to medium size models (such as `gpt-oss-120b` or `mistral-small-3.2-24b-instruct-2506`)
+- Time to first token should be less than `1` second for most standard queries (with less than 1000 input tokens)
+- Output token generation speed should be above `100` tokens per second for recent small to medium size models (such as `gpt-oss-120b` or `mistral-small-3.2-24b-instruct-2506`)
 
-Exact performance will still vary based on these main factors:
+Exact performance will still vary based mainly on the following factors:
 - Model size and architecture: Smaller and more recent models usually provide better performance.
 - Model type: 
-  - Chat models time to first token increase proportionally to the input context size after a certain threshold (usually above `1 000` tokens).
-  - Audio transcription models time to first token remains mostly constant, as they only need to process small number of input tokens (`30` seconds audio chunk) to generate a first output.
-- Input and output size: As a first approximation, total processing time is proportionnal to input and output size. However, for significant size queries (usually above `10 000` tokens), processing speed may degrade with query size. For optimal performance, we recommend splitting queries in the smallest meaningful part (`10` queries with `1 000` input tokens and `100` output tokens will be processed faster than `1` query with `10 000` input tokens and `1 000` output tokens). 
+  - Chat models' time to first token increases proportionally to the input context size after a certain threshold (usually above `1 000` tokens).
+  - Audio transcription models' time to first token remains mostly constant, as they only need to process small numbers of input tokens (`30` seconds audio chunk) to generate a first output.
+- Input and output size: In rough terms, total processing time is proportional to input and output size. However, for larger queries (usually above `10 000` tokens), processing speed may degrade with query size. For optimal performance, we recommend splitting queries into the smallest meaningful parts (`10` queries with `1 000` input tokens and `100` output tokens will be processed faster than `1` query with `10 000` input tokens and `1 000` output tokens). 
 
 ## Quotas and limitations