You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/resource-known-issues.md
+17-32Lines changed: 17 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,7 +131,6 @@ These are known issues for Azure Machine Learning Datasets.
131
131
132
132
If you don't include the leading forward slash, '/', you'll need to prefix the working directory e.g. `/mnt/batch/.../tmp/dataset` on the compute target to indicate where you want the dataset to be mounted.
133
133
134
-
135
134
### Data labeling projects issues
136
135
137
136
Known issues with labeling projects.
@@ -152,48 +151,35 @@ Known issues with the designer.
152
151
153
152
## Train models
154
153
155
-
### ModuleErrors (No module named)
156
-
157
-
If you are running into ModuleErrors while submitting experiments in Azure ML, it means that the training script is expecting a package to be installed but it isn't added. Once you provide the package name, Azure ML will install the package in the environment used for your training run.
154
+
***ModuleErrors (No module named)**: If you are running into ModuleErrors while submitting experiments in Azure ML, it means that the training script is expecting a package to be installed but it isn't added. Once you provide the package name, Azure ML will install the package in the environment used for your training run.
158
155
159
-
If you are using [Estimators](concept-azure-machine-learning-architecture.md#estimators) to submit experiments, you can specify a package name via `pip_packages` or `conda_packages` parameter in the estimator based on from which source you want to install the package. You can also specify a yml file with all your dependencies using `conda_dependencies_file`or list all your pip requirements in a txt file using `pip_requirements_file` parameter. If you have your own Azure ML Environment object that you want to override the default image used by the estimator, you can specify that environment via the `environment` parameter of the estimator constructor.
156
+
If you are using [Estimators](concept-azure-machine-learning-architecture.md#estimators) to submit experiments, you can specify a package name via `pip_packages` or `conda_packages` parameter in the estimator based on from which source you want to install the package. You can also specify a yml file with all your dependencies using `conda_dependencies_file`or list all your pip requirements in a txt file using `pip_requirements_file` parameter. If you have your own Azure ML Environment object that you want to override the default image used by the estimator, you can specify that environment via the `environment` parameter of the estimator constructor.
160
157
161
-
Azure ML also provides framework-specific estimators for Tensorflow, PyTorch, Chainer and SKLearn. Using these estimators will make sure that the core framework dependencies are installed on your behalf in the environment used for training. You have the option to specify extra dependencies as described above.
158
+
Azure ML also provides framework-specific estimators for Tensorflow, PyTorch, Chainer and SKLearn. Using these estimators will make sure that the core framework dependencies are installed on your behalf in the environment used for training. You have the option to specify extra dependencies as described above.
162
159
163
-
Azure ML maintained docker images and their contents can be seen in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
164
-
Framework-specific dependencies are listed in the respective framework documentation - [Chainer](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py#remarks), [PyTorch](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py#remarks), [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py#remarks), [SKLearn](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py#remarks).
160
+
Azure ML maintained docker images and their contents can be seen in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
161
+
Framework-specific dependencies are listed in the respective framework documentation - [Chainer](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py#remarks), [PyTorch](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py#remarks), [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py#remarks), [SKLearn](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py#remarks).
165
162
166
-
> [!Note]
167
-
> If you think a particular package is common enough to be added in Azure ML maintained images and environments please raise a GitHub issue in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
163
+
> [!Note]
164
+
> If you think a particular package is common enough to be added in Azure ML maintained images and environments please raise a GitHub issue in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
168
165
169
-
### NameError (Name not defined), AttributeError (Object has no attribute)
170
-
171
-
This exception should come from your training scripts. You can look at the log files from Azure portal to get more information about the specific name not defined or attribute error. From the SDK, you can use `run.get_details()` to look at the error message. This will also listall the log files generated for your run. Please make sure to take a look at your training script and fix the error before resubmitting your run.
172
-
173
-
### Horovod has been shut down
174
-
175
-
In most cases if you encounter "AbortedError: Horovod has been shut down" this exception means there was an underlying exception in one of the processes that caused Horovod to shut down. Each rank in the MPI job gets it own dedicated log filein Azure ML. These logs are named `70_driver_logs`. In case of distributed training, the log names are suffixed with`_rank` to make it easier to differentiate the logs. To find the exact error that caused Horovod to shut down, go through all the log files and look for`Traceback` at the end of the driver_log files. One of these files will give you the actual underlying exception.
176
-
177
-
166
+
***NameError (Name not defined), AttributeError (Object has no attribute)**: This exception should come from your training scripts. You can look at the log files from Azure portal to get more information about the specific name not defined or attribute error. From the SDK, you can use `run.get_details()` to look at the error message. This will also listall the log files generated for your run. Please make sure to take a look at your training script and fix the error before resubmitting your run.
178
167
179
-
### Run or experiment deletion
168
+
***Horovod has been shut down**: In most cases if you encounter "AbortedError: Horovod has been shut down" this exception means there was an underlying exception in one of the processes that caused Horovod to shut down. Each rank in the MPI job gets it own dedicated log filein Azure ML. These logs are named `70_driver_logs`. In case of distributed training, the log names are suffixed with`_rank` to make it easier to differentiate the logs. To find the exact error that caused Horovod to shut down, go through all the log files and look for`Traceback` at the end of the driver_log files. One of these files will give you the actual underlying exception.
180
169
181
-
Experiments can be archived by using the [Experiment.archive](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py#archive--)
170
+
***Run or experiment deletion**: Experiments can be archived by using the [Experiment.archive](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py#archive--)
182
171
method, orfrom the Experiment tab view in Azure Machine Learning studio client via the "Archive experiment" button. This action hides the experiment fromlist queries and views, but does not delete it.
183
172
184
-
Permanent deletion of individual experiments or runs isnot currently supported. For more information on deleting Workspace assets, see [Export or delete your Machine Learning service workspace data](how-to-export-delete-data.md).
173
+
Permanent deletion of individual experiments or runs isnot currently supported. For more information on deleting Workspace assets, see [Export or delete your Machine Learning service workspace data](how-to-export-delete-data.md).
185
174
186
-
### Metric Document is too large
175
+
***Metric Document is too large**: Azure Machine Learning has internal limits on the size of metric objects that can be logged at once from a training run. If you encounter a "Metric Document is too large" error when logging a list-valued metric, try splitting the list into smaller chunks, for example:
187
176
188
-
Azure Machine Learning has internal limits on the size of metric objects that can be logged at once from a training run. If you encounter a "Metric Document is too large" error when logging a list-valued metric, try splitting the list into smaller chunks, for example:
189
-
190
-
```python
191
-
run.log_list("my metric name", my_metric[:N])
192
-
run.log_list("my metric name", my_metric[N:])
193
-
```
194
-
195
-
Internally, Azure ML concatenates the blocks with the same metric name into a contiguous list.
177
+
```python
178
+
run.log_list("my metric name", my_metric[:N])
179
+
run.log_list("my metric name", my_metric[N:])
180
+
```
196
181
182
+
Internally, Azure ML concatenates the blocks with the same metric name into a contiguous list.
197
183
198
184
## Automated machine learning
199
185
@@ -205,7 +191,6 @@ Internally, Azure ML concatenates the blocks with the same metric name into a co
205
191
206
192
***Databricks >10 iterations for automated machine learning**: In automated machine learning settings, if you have more than 10 iterations, set`show_output` to `False` when you submit the run.
207
193
208
-
209
194
***Databricks widget for the Azure Machine Learning SDKand automated machine learning**: The Azure Machine Learning SDK widget isn't supported in a Databricks notebook because the notebooks can't parse HTML widgets. You can view the widget in the portal by using this Python code in your Azure Databricks notebook cell:
0 commit comments