Merge pull request #1789 from MicrosoftDocs/main

PhilKang0704 · web-flow · commit a69ff80e05b4 · 2024-12-03T13:30:42.000+08:00
12/3 11:00 AM IST Publish
diff --git a/articles/ai-services/openai/quotas-limits.md b/articles/ai-services/openai/quotas-limits.md
@@ -24,7 +24,7 @@ The following sections provide you with a quick guide to the default quotas and
 
 | Limit Name | Limit Value |
 |--|--|
-| OpenAI resources per region per Azure subscription | 30 |
+| Azure OpenAI resources per region per Azure subscription | 30 |
 | Default DALL-E 2 quota limits | 2 concurrent requests |
 | Default DALL-E 3 quota limits| 2 capacity units (6 requests per minute)|
 | Default Whisper quota limits | 3 requests per minute |
@@ -44,8 +44,8 @@ The following sections provide you with a quick guide to the default quotas and
 | Max number of `/chat/completions` functions | 128 |
 | Max number of `/chat completions` tools | 128 |
 | Maximum number of Provisioned throughput units per deployment | 100,000 |
-| Max files per Assistant/thread | 10,000 when using the API or AI Foundry. 20 when using Azure OpenAI Studio.|
-| Max file size for Assistants & fine-tuning | 512 MB |
+| Max files per Assistant/thread | 10,000 when using the API or Azure AI Foundry portal. In Azure OpenAI Studio the limit was 20.|
+| Max file size for Assistants & fine-tuning | 512 MB<br/><br/>200 MB via Azure AI Foundry portal |
 | Max size for all uploaded files for Assistants |100 GB |
 | Assistants token limit | 2,000,000 token limit |
 | GPT-4o max images per request (# of images in the messages array/conversation history) | 50 |
@@ -181,7 +181,7 @@ To minimize issues related to rate limits, it's a good idea to use the following
 
 ### How to request increases to the default quotas and limits
 
-Quota increase requests can be submitted from the [Quotas](./how-to/quota.md) page of Azure AI Foundry. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
+Quota increase requests can be submitted from the [Quotas](./how-to/quota.md) page in the Azure AI Foundry portal. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
 
 For other rate limits, [submit a service request](../cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
 
diff --git a/articles/machine-learning/how-to-auto-train-forecast.md b/articles/machine-learning/how-to-auto-train-forecast.md
@@ -1314,7 +1314,13 @@ For a more detailed example, see the [demand forecasting with many models notebo
 
 #### Training considerations for a many models run
 
-The many models training and inference components conditionally partition your data according to the `partition_column_names` setting so each partition is in its own file. This process can be very slow or fail when data is very large. The recommendation is to partition your data manually before you run many models training or inference.
+- The many models training and inference components conditionally partition your data according to the `partition_column_names` setting. This process results in each partition being in its own file. The process can be very slow or fail when data is very large. The recommendation is to partition your data manually before you run many models training or inference.
+
+- During many models training, models are automatically registered in the workspace, and hence manual registration of models are not required. Models are named based on the partition on which they were trained and this is not customizable. Same for tags, these are not customizable, and we use these properties to auto detect models during inference.
+
+- Deploying individual model is not at all scalable, and hence we provide `PipelineComponentBatchDeployment` to ease the deployment process. Please refer [demand forecasting with many models notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/1k_demand_forecast_pipeline/aml-demand-forecast-mm-pipeline/aml-demand-forecast-mm-pipeline.ipynb) to see this in action. 
+
+- During inference, appropriate models (latest version) are automatically selected based on the partition sent in the inference data. By default, latest models are selected from an experiment by providing `training_experiment_name` but you can override to select models from a particular training run by also providing `train_run_id`.
 
 > [!NOTE]
 > The default parallelism limit for a many models run within a subscription is set to 320. If your workload requires a higher limit, you can contact Microsoft support.