|
| 1 | +--- |
| 2 | + |
| 3 | +meta: |
| 4 | + title: Managed Inference FAQ |
| 5 | + description: Get answers to the most frequently asked questions about Scaleway Managed Inference. |
| 6 | +content: |
| 7 | + h1: Managed Inference |
| 8 | +dates: |
| 9 | + validation: 2025-02-12 |
| 10 | +category: ai-data |
| 11 | +productIcon: InferenceProductIcon |
| 12 | +--- |
| 13 | + |
| 14 | +## What is Scaleway Managed Inference? |
| 15 | +Scaleway's Managed Inference is a fully managed service that allows you to deploy, run, and scale AI models in a dedicated environment. |
| 16 | +It provides optimized infrastructure, customizable deployment options, and secure access controls to meet the needs of enterprises and developers looking for high-performance inference solutions. |
| 17 | + |
| 18 | +## Where are the inference servers located? |
| 19 | +All models are currently hosted in a secure data center located in Paris, France, operated by [OPCORE](https://www.opcore.com/). This ensures low latency for European users and compliance with European data privacy regulations. |
| 20 | + |
| 21 | +## What is the difference between Managed Inference and Generative APIs? |
| 22 | +- **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control. |
| 23 | +- **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage. |
| 24 | + |
| 25 | +## Where can I find information regarding the data, privacy and security policies applied to Scaleway's AI services? |
| 26 | +You can find detailed information regarding the policies applied to Scaleway's AI services in our [Data, privacy, and security for Scaleway's AI services](/managed-inference/reference-content/data-privacy-security-scaleway-ai-services/) documentation. |
| 27 | + |
| 28 | +## Is Managed Inference compatible with Open AI APIs? |
| 29 | +Managed Inference aims to achieve seamless compatibility with OpenAI APIs. You can detailed information in the following documentation: [Scaleway Managed Inference as drop-in replacement for the OpenAI APIs](/managed-inference/reference-content/openai-compatibility/). |
| 30 | + |
| 31 | +## What are the SLAs applicable to Managed Inference? |
| 32 | +We are currently working on defining our SLAs for Managed Inference. We will provide more information on this topic soon. |
| 33 | + |
| 34 | +## What are the performance guarantees (vs. Generative APIs)? |
| 35 | +Managed Inference provides dedicated resources, ensuring predictable performance and lower latency compared to Generative APIs, which are a shared, serverless offering optimized for scalability. Managed Inference is ideal for workloads that require consistent response times, high availability, and custom hardware configurations. |
| 36 | + |
| 37 | +## What types of models can I deploy with Managed Inference? |
| 38 | +You can deploy a variety of models, including: |
| 39 | +* Large language models (LLMs) |
| 40 | +* Image processing models |
| 41 | +* Audio recognition models |
| 42 | +* Custom AI models |
| 43 | +Managed Inference supports both open-source models and proprietary models that you upload. |
| 44 | + |
| 45 | +## How do I deploy a model using Managed Inference? |
| 46 | +Deployment is done through Scaleway's [console](https://console.scaleway.com/inference/deployments) or [API](https://www.scaleway.com/en/developers/api/inference/). You can choose a model from Scaleway’s selection or import your own directly from Hugging Face's repositories, configure [Instance types](/gpu/reference-content/choosing-gpu-instance-type/), set up networking options, and start inference with minimal setup. |
| 47 | + |
| 48 | +## Can I fine-tune or retrain my models within Managed Inference? |
| 49 | +Managed Inference is primarily designed for deploying and running inference workloads. If you need to fine-tune or retrain models, you may need to use a separate training environment, such as [Scaleway’s GPU Instances](/gpu/quickstart/), and then deploy the trained model in Managed Inference. |
| 50 | + |
| 51 | +## What Instance types are available for inference? |
| 52 | +Managed Inference offers different Instance types optimized for various workloads from Scaleway's [GPU Instances](/gpu/reference-content/choosing-gpu-instance-type/) range. |
| 53 | +You can select the Instance type based on your model’s computational needs and compatibility. |
| 54 | + |
| 55 | +## How is Managed Inference billed? |
| 56 | +Billing is based on the Instance type and usage duration. Unlike [Generative APIs](/generative-apis/quickstart/), which are billed per token, Managed Inference provides predictable costs based on the allocated infrastructure. |
| 57 | +Pricing details can be found on the [Scaleway pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#managed-inference). |
| 58 | + |
| 59 | +## Can I run inference on private models? |
| 60 | +Yes, Managed Inference allows you to deploy private models with access control settings. You can restrict access to specific users, teams, or networks. |
| 61 | + |
| 62 | +## Does Managed Inference support model quantization? |
| 63 | +Yes, Scaleway Managed Inference supports model [quantization](/managed-inference/concepts/#quantization) to optimize performance and reduce inference latency. You can select different quantization options depending on your accuracy and efficiency requirements. |
| 64 | + |
| 65 | +## Is Managed Inference suitable for real-time applications? |
| 66 | +Yes, Managed Inference is designed for low-latency, high-throughput applications, making it suitable for real-time use cases such as chatbots, recommendation systems, fraud detection, and live video processing. |
| 67 | + |
| 68 | +## Can I use Managed Inference with other Scaleway services? |
| 69 | +Absolutely. Managed Inference integrates seamlessly with other Scaleway services, such as [Object Storage](/object-storage/quickstart/) for model hosting, [Kubernetes](/kubernetes/quickstart/) for containerized applications, and [Scaleway IAM](/iam/quickstart/) for access management. |
0 commit comments