feat(infr): add faq

bene2k1 · bene2k1 · commit da563eceebd4 · 2025-02-13T10:56:27.000+01:00
diff --git a/faq/managed-inference.mdx b/faq/managed-inference.mdx
@@ -0,0 +1,69 @@
+---
+
+meta:
+  title: Managed Inference FAQ
+  description: Get answers to the most frequently asked questions about Scaleway Managed Inference.
+content:
+  h1: Managed Inference
+dates:
+  validation: 2025-02-12
+category: ai-data
+productIcon: InferenceProductIcon
+---
+
+## What is Scaleway Managed Inference?
+Scaleway's Managed Inference is a fully managed service that allows you to deploy, run, and scale AI models in a dedicated environment.
+It provides optimized infrastructure, customizable deployment options, and secure access controls to meet the needs of enterprises and developers looking for high-performance inference solutions.
+
+## Where are the inference servers located?
+All models are currently hosted in a secure data center located in Paris, France, operated by [OPCORE](https://www.opcore.com/). This ensures low latency for European users and compliance with European data privacy regulations.
+
+## What is the difference between Managed Inference and Generative APIs?
+- **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control.
+- **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage.
+
+## Where can I find information regarding the data, privacy and security policies applied to Scaleway's AI services?
+You can find detailed information regarding the policies applied to Scaleway's AI services in our [Data, privacy, and security for Scaleway's AI services](/managed-inference/reference-content/data-privacy-security-scaleway-ai-services/) documentation.
+
+## Is Managed Inference compatible with Open AI APIs?
+Managed Inference aims to achieve seamless compatibility with OpenAI APIs. You can detailed information in the following documentation: [Scaleway Managed Inference as drop-in replacement for the OpenAI APIs](/managed-inference/reference-content/openai-compatibility/).
+
+## What are the SLAs applicable to Managed Inference?
+We are currently working on defining our SLAs for Managed Inference. We will provide more information on this topic soon.
+
+## What are the performance guarantees (vs. Generative APIs)?
+Managed Inference provides dedicated resources, ensuring predictable performance and lower latency compared to Generative APIs, which are a shared, serverless offering optimized for scalability. Managed Inference is ideal for workloads that require consistent response times, high availability, and custom hardware configurations.
+
+## What types of models can I deploy with Managed Inference?
+You can deploy a variety of models, including:
+* Large language models (LLMs)
+* Image processing models
+* Audio recognition models
+* Custom AI models
+Managed Inference supports both open-source models and proprietary models that you upload.
+
+## How do I deploy a model using Managed Inference?
+Deployment is done through Scaleway's [console](https://console.scaleway.com/inference/deployments) or [API](https://www.scaleway.com/en/developers/api/inference/). You can choose a model from Scaleway’s selection or import your own directly from Hugging Face's repositories, configure [Instance types](/gpu/reference-content/choosing-gpu-instance-type/), set up networking options, and start inference with minimal setup.
+
+## Can I fine-tune or retrain my models within Managed Inference?
+Managed Inference is primarily designed for deploying and running inference workloads. If you need to fine-tune or retrain models, you may need to use a separate training environment, such as [Scaleway’s GPU Instances](/gpu/quickstart/), and then deploy the trained model in Managed Inference.
+
+## What Instance types are available for inference?
+Managed Inference offers different Instance types optimized for various workloads from Scaleway's [GPU Instances](/gpu/reference-content/choosing-gpu-instance-type/) range.
+You can select the Instance type based on your model’s computational needs and compatibility.
+
+## How is Managed Inference billed?
+Billing is based on the Instance type and usage duration. Unlike [Generative APIs](/generative-apis/quickstart/), which are billed per token, Managed Inference provides predictable costs based on the allocated infrastructure.
+Pricing details can be found on the [Scaleway pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#managed-inference).
+
+## Can I run inference on private models?
+Yes, Managed Inference allows you to deploy private models with access control settings. You can restrict access to specific users, teams, or networks.
+
+## Does Managed Inference support model quantization?
+Yes, Scaleway Managed Inference supports model [quantization](/managed-inference/concepts/#quantization) to optimize performance and reduce inference latency. You can select different quantization options depending on your accuracy and efficiency requirements.
+
+## Is Managed Inference suitable for real-time applications?
+Yes, Managed Inference is designed for low-latency, high-throughput applications, making it suitable for real-time use cases such as chatbots, recommendation systems, fraud detection, and live video processing.
+
+## Can I use Managed Inference with other Scaleway services?
+Absolutely. Managed Inference integrates seamlessly with other Scaleway services, such as [Object Storage](/object-storage/quickstart/) for model hosting, [Kubernetes](/kubernetes/quickstart/) for containerized applications, and [Scaleway IAM](/iam/quickstart/) for access management.