Skip to content

Commit fa88580

Browse files
authored
feat(inference): add monitoring performance to faq (#4567)
1 parent 04f7ec7 commit fa88580

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

faq/managed-inference.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ We are currently working on defining our SLAs for Managed Inference. We will pro
3535
Managed Inference provides dedicated resources, ensuring predictable performance and lower latency compared to Generative APIs, which are a shared, serverless offering optimized for infrequent traffic with moderate peak loads. Managed Inference is ideal for workloads that require consistent response times, high availability, custom hardware configurations or generate extreme peak loads during a narrow period of time.
3636
Compared to Generative APIs, no usage quota is applied to the number of tokens per second generated, since the output is limited by the GPU Instances size and number of your Managed Inference Deployment.
3737

38+
## How can I monitor performance?
39+
Managed Inference metrics and logs are available in [Scaleway Cockpit](https://console.scaleway.com/cockpit/overview). You can follow your deployment metrics in realtime, such as tokens throughput, requests latency, GPU power usage and GPU VRAM usage.
40+
3841
## What types of models can I deploy with Managed Inference?
3942
You can deploy a variety of models, including:
4043
* Large language models (LLMs)

0 commit comments

Comments
 (0)