-
Notifications
You must be signed in to change notification settings - Fork 258
feat(infr): add faq #4410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
feat(infr): add faq #4410
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| --- | ||
|
|
||
| meta: | ||
| title: Managed Inference FAQ | ||
| description: Get answers to the most frequently asked questions about Scaleway Managed Inference. | ||
| content: | ||
| h1: Managed Inference | ||
| dates: | ||
| validation: 2025-02-12 | ||
| category: ai-data | ||
| productIcon: InferenceProductIcon | ||
| --- | ||
|
|
||
| ## What is Scaleway Managed Inference? | ||
| Scaleway's Managed Inference is a fully managed service that allows you to deploy, run, and scale AI models in a dedicated environment. | ||
| It provides optimized infrastructure, customizable deployment options, and secure access controls to meet the needs of enterprises and developers looking for high-performance inference solutions. | ||
|
|
||
| ## Where are the inference servers located? | ||
| All models are currently hosted in a secure data center located in Paris, France, operated by [OPCORE](https://www.opcore.com/). This ensures low latency for European users and compliance with European data privacy regulations. | ||
|
|
||
| ## What is the difference between Managed Inference and Generative APIs? | ||
| - **Managed Inference**: Allows deployment of curated or custom models with chosen quantization and instances, offering predictable throughput and enhanced security features like private network isolation and access control. Managed Inference is billed by hourly usage, whether provisioned capacity is receiving traffic or not. | ||
| - **Generative APIs**: A serverless service providing access to pre-configured AI models via API, billed per token usage. | ||
|
|
||
| ## Where can I find information regarding the data, privacy, and security policies applied to Scaleway's AI services? | ||
| You can find detailed information regarding the policies applied to Scaleway's AI services in our [Data, privacy, and security for Scaleway's AI services](/managed-inference/reference-content/data-privacy-security-scaleway-ai-services/) documentation. | ||
|
|
||
| ## Is Managed Inference compatible with Open AI APIs? | ||
| Managed Inference aims to achieve seamless compatibility with OpenAI APIs. You can detailed information in the following documentation: [Scaleway Managed Inference as drop-in replacement for the OpenAI APIs](/managed-inference/reference-content/openai-compatibility/). | ||
|
|
||
| ## What are the SLAs applicable to Managed Inference? | ||
| We are currently working on defining our SLAs for Managed Inference. We will provide more information on this topic soon. | ||
|
|
||
| ## What are the performance guarantees (vs. Generative APIs)? | ||
| Managed Inference provides dedicated resources, ensuring predictable performance and lower latency compared to Generative APIs, which are a shared, serverless offering optimized for infrequent traffic with moderate peak loads. Managed Inference is ideal for workloads that require consistent response times, high availability, custom hardware configurations or generate extreme peak loads during a narrow period of time. | ||
| Compared to Generative APIs, no usage quota is applied to the number of tokens per second generated, since the output is limited by the GPU Instances size and number of your Managed Inference Deployment. | ||
|
|
||
| ## What types of models can I deploy with Managed Inference? | ||
| You can deploy a variety of models, including: | ||
| * Large language models (LLMs) | ||
| * Image processing models | ||
| * Audio recognition models | ||
| * Custom AI models (through API only yet) | ||
| Managed Inference supports both open-source models and proprietary models that you upload. | ||
bene2k1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## How do I deploy a model using Managed Inference? | ||
| Deployment is done through Scaleway's [console](https://console.scaleway.com/inference/deployments) or [API](https://www.scaleway.com/en/developers/api/inference/). You can choose a model from Scaleway’s selection or import your own directly from Hugging Face's repositories, configure [Instance types](/gpu/reference-content/choosing-gpu-instance-type/), set up networking options, and start inference with minimal setup. | ||
|
|
||
| ## Can I fine-tune or retrain my models within Managed Inference? | ||
| Managed Inference is primarily designed for deploying and running inference workloads. If you need to fine-tune or retrain models, you may need to use a separate training environment, such as [Scaleway’s GPU Instances](/gpu/quickstart/), and then deploy the trained model in Managed Inference. | ||
|
|
||
| ## What Instance types are available for inference? | ||
| Managed Inference offers different Instance types optimized for various workloads from Scaleway's [GPU Instances](/gpu/reference-content/choosing-gpu-instance-type/) range. | ||
| You can select the Instance type based on your model’s computational needs and compatibility. | ||
|
|
||
| ## How is Managed Inference billed? | ||
| Billing is based on the Instance type and usage duration. Unlike [Generative APIs](/generative-apis/quickstart/), which are billed per token, Managed Inference provides predictable costs based on the allocated infrastructure. | ||
| Pricing details can be found on the [Scaleway pricing page](https://www.scaleway.com/en/pricing/model-as-a-service/#managed-inference). | ||
|
|
||
| ## Can I run inference on private models? | ||
| Yes, Managed Inference allows you to deploy private models with access control settings. You can restrict access to specific users, teams, or networks. | ||
|
|
||
| ## Does Managed Inference support model quantization? | ||
| Yes, Scaleway Managed Inference supports model [quantization](/managed-inference/concepts/#quantization) to optimize performance and reduce inference latency. You can select different quantization options depending on your accuracy and efficiency requirements. | ||
|
|
||
| ## Is Managed Inference suitable for real-time applications? | ||
| Yes, Managed Inference is designed for low-latency, high-throughput applications, making it suitable for real-time use cases such as chatbots, recommendation systems, fraud detection, and live video processing. | ||
|
|
||
| ## Can I use Managed Inference with other Scaleway services? | ||
| Absolutely. Managed Inference integrates seamlessly with other Scaleway services, such as [Object Storage](/object-storage/quickstart/) for model hosting, [Kubernetes](/kubernetes/quickstart/) for containerized applications, and [Scaleway IAM](/iam/quickstart/) for access management. | ||
|
|
||
| ## Do model licenses apply when using Managed Inference? | ||
| Yes, model licenses need to be complied with when using Managed Inference. Applicable licenses are available for [each model in our documentation](/managed-inference/reference-content/). | ||
| - For models provided in the Scaleway catalog, you need to accept licenses (including potential EULA) before creating any Managed Inference deployment. | ||
| - For custom models you choose to import on Scaleway, you are responsible for complying with model licenses (as with any software you choose to install on a GPU Instance for example). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.