You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-use-low-priority-batch.md
+16-10Lines changed: 16 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,22 +29,30 @@ Low priority virtual machines are offered at a reduced price compared with dedic
29
29
30
30
Azure Machine Learning Batch Deployments provides several capabilities that make it easy to consume and benefit from low priority VMs:
31
31
32
-
- Batch deployment jobs consume low priority VMs by running on Azure Machine Learning compute clusters created with low priority VMs. After a deployment is associated with a low priority VMs' cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration isn't possible.
32
+
- Batch deployment jobs consume low priority VMs by running on Azure Machine Learning compute clusters created with low priority VMs. After a deployment is associated with a low priority VMs cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration isn't possible.
33
33
- Batch deployment jobs automatically seek the target number of VMs in the available compute cluster based on the number of tasks to submit. If VMs are preempted or unavailable, batch deployment jobs attempt to replace the lost capacity by queuing the failed tasks to the cluster.
34
34
- Low priority VMs have a separate vCPU quota that differs from the one for dedicated VMs. Low-priority cores per region have a default limit of 100 to 3,000, depending on your subscription. The number of low-priority cores per subscription can be increased and is a single value across VM families. See [Azure Machine Learning compute quotas](how-to-manage-quotas.md#azure-machine-learning-compute).
35
35
36
-
## Considerations and use cases
36
+
###Considerations and use cases
37
37
38
38
Many batch workloads are a good fit for low priority VMs. Using low priority VMs can introduce execution delays when deallocation of VMs occurs. If you have flexibility in the time jobs have to finish, you might tolerate the potential drops in capacity.
39
39
40
40
When you deploy models under batch endpoints, rescheduling can be done at the minibatch level. That approach has the benefit that deallocation only impacts those minibatches that are currently being processed and not finished on the affected node. All completed progress is kept.
41
41
42
-
## Creating batch deployments with low priority VMs
42
+
### Limitations
43
+
44
+
- After a deployment is associated with a low priority VMs cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration isn't possible.
45
+
- Rescheduling is done at the mini-batch level, regardless of the progress. No checkpointing capability is provided.
46
+
47
+
> [!WARNING]
48
+
> In the cases where the entire cluster is preempted or running on a single-node cluster, the job is cancelled because there is no capacity available for it to run. Resubmitting is required in this case.
49
+
50
+
## Create batch deployments that use low priority VMs
43
51
44
52
Batch deployment jobs consume low priority VMs by running on Azure Machine Learning compute clusters created with low priority VMs.
45
53
46
54
> [!NOTE]
47
-
> After a deployment is associated with a low priority VMs' cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration is not possible.
55
+
> After a deployment is associated with a low priority VMs cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration is not possible.
48
56
49
57
You can create a low priority Azure Machine Learning compute cluster as follows:
50
58
@@ -161,10 +169,8 @@ To view these metrics in the Azure portal:
161
169
162
170
:::image type="content" source="./media/how-to-use-low-priority-batch/metrics.png" lightbox="./media/how-to-use-low-priority-batch/metrics.png" alt-text="Screenshot of the metrics section in the resource monitoring pane that shows the relevant metrics for low priority VMs.":::
163
171
164
-
## Limitations
165
-
166
-
- After a deployment is associated with a low priority VMs' cluster, all the jobs produced by such deployment use low priority VMs. Per-job configuration isn't possible.
167
-
- Rescheduling is done at the mini-batch level, regardless of the progress. No checkpointing capability is provided.
172
+
## Related content
168
173
169
-
> [!WARNING]
170
-
> In the cases where the entire cluster is preempted or running on a single-node cluster, the job is cancelled because there is no capacity available for it to run. Resubmitting is required in this case.
174
+
-[Create an Azure Machine Learning compute cluster](how-to-create-attach-compute-cluster.md)
175
+
-[Deploy MLflow models in batch deployments](how-to-mlflow-batch.md)
176
+
-[Manage compute resources for model training](how-to-create-attach-compute-studio.md)
0 commit comments