Skip to content

Commit da563ec

Browse files
committed
feat(infr): add faq
1 parent 7d07487 commit da563ec

File tree

1 file changed

+69
-0
lines changed

1 file changed

+69
-0
lines changed

faq/managed-inference.mdx

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
3+
meta:
4+
title: Managed Inference FAQ
5+
description: Get answers to the most frequently asked questions about Scaleway Managed Inference.
6+
content:
7+
h1: Managed Inference
8+
dates:
9+
validation: 2025-02-12
10+
category: ai-data
11+
productIcon: InferenceProductIcon
12+
---
13+
14+
## What is Scaleway Managed Inference?
15+
Scaleway's Managed Inference is a fully managed service that allows you to deploy, run, and scale AI models in a dedicated environment.
16+
It provides optimized infrastructure, customizable deployment options, and secure access controls to meet the needs of enterprises and developers looking for high-performance inference solutions.
17+
18+
## Where are the inference servers located?
19+
All models are currently hosted in a secure data center located in Paris, France, operated by [OPCORE](https://www.opcore.com/). This ensures low latency for European users and compliance with European data privacy regulations.
20+
21+
## What is the difference between Managed Inference and Generative APIs?
22+
- **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control.
23+
- **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage.
24+
25+
## Where can I find information regarding the data, privacy and security policies applied to Scaleway's AI services?
26+
You can find detailed information regarding the policies applied to Scaleway's AI services in our [Data, privacy, and security for Scaleway's AI services](/managed-inference/reference-content/data-privacy-security-scaleway-ai-services/) documentation.
27+
28+
## Is Managed Inference compatible with Open AI APIs?
29+
Managed Inference aims to achieve seamless compatibility with OpenAI APIs. You can detailed information in the following documentation: [Scaleway Managed Inference as drop-in replacement for the OpenAI APIs](/managed-inference/reference-content/openai-compatibility/).
30+
31+
## What are the SLAs applicable to Managed Inference?
32+
We are currently working on defining our SLAs for Managed Inference. We will provide more information on this topic soon.
33+
34+
## What are the performance guarantees (vs. Generative APIs)?
35+
Managed Inference provides dedicated resources, ensuring predictable performance and lower latency compared to Generative APIs, which are a shared, serverless offering optimized for scalability. Managed Inference is ideal for workloads that require consistent response times, high availability, and custom hardware configurations.
36+
37+
## What types of models can I deploy with Managed Inference?
38+
You can deploy a variety of models, including:
39+
* Large language models (LLMs)
40+
* Image processing models
41+
* Audio recognition models
42+
* Custom AI models
43+
Managed Inference supports both open-source models and proprietary models that you upload.
44+
45+
## How do I deploy a model using Managed Inference?
46+
Deployment is done through Scaleway's [console](https://console.scaleway.com/inference/deployments) or [API](https://www.scaleway.com/en/developers/api/inference/). You can choose a model from Scaleway’s selection or import your own directly from Hugging Face's repositories, configure [Instance types](/gpu/reference-content/choosing-gpu-instance-type/), set up networking options, and start inference with minimal setup.
47+
48+
## Can I fine-tune or retrain my models within Managed Inference?
49+
Managed Inference is primarily designed for deploying and running inference workloads. If you need to fine-tune or retrain models, you may need to use a separate training environment, such as [Scaleway’s GPU Instances](/gpu/quickstart/), and then deploy the trained model in Managed Inference.
50+
51+
## What Instance types are available for inference?
52+
Managed Inference offers different Instance types optimized for various workloads from Scaleway's [GPU Instances](/gpu/reference-content/choosing-gpu-instance-type/) range.
53+
You can select the Instance type based on your model’s computational needs and compatibility.
54+
55+
## How is Managed Inference billed?
56+
Billing is based on the Instance type and usage duration. Unlike [Generative APIs](/generative-apis/quickstart/), which are billed per token, Managed Inference provides predictable costs based on the allocated infrastructure.
57+
Pricing details can be found on the [Scaleway pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#managed-inference).
58+
59+
## Can I run inference on private models?
60+
Yes, Managed Inference allows you to deploy private models with access control settings. You can restrict access to specific users, teams, or networks.
61+
62+
## Does Managed Inference support model quantization?
63+
Yes, Scaleway Managed Inference supports model [quantization](/managed-inference/concepts/#quantization) to optimize performance and reduce inference latency. You can select different quantization options depending on your accuracy and efficiency requirements.
64+
65+
## Is Managed Inference suitable for real-time applications?
66+
Yes, Managed Inference is designed for low-latency, high-throughput applications, making it suitable for real-time use cases such as chatbots, recommendation systems, fraud detection, and live video processing.
67+
68+
## Can I use Managed Inference with other Scaleway services?
69+
Absolutely. Managed Inference integrates seamlessly with other Scaleway services, such as [Object Storage](/object-storage/quickstart/) for model hosting, [Kubernetes](/kubernetes/quickstart/) for containerized applications, and [Scaleway IAM](/iam/quickstart/) for access management.

0 commit comments

Comments
 (0)