Skip to content

Commit 88b309c

Browse files
committed
links+section update
1 parent defe5ba commit 88b309c

File tree

3 files changed

+9
-9
lines changed

3 files changed

+9
-9
lines changed

articles/machine-learning/concept-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Datasets can be created from local files, public urls, [Azure Open Datasets](htt
7070
We support 2 types of datasets:
7171
+ A [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. You can load a TabularDataset into a Pandas or Spark DataFrame for further manipulation and cleansing. For a complete list of data formats you can create TabularDatasets from, see the [TabularDatasetFactory class](https://aka.ms/tabulardataset-api-reference).
7272

73-
+ A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. You can [download or mount files](how-to-train-with-datasets.md#option-2--mount-files-to-a-remote-compute-target) referenced by FileDatasets to your compute target.
73+
+ A [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. You can [download or mount files](how-to-train-with-datasets.md#mount-files-to-a-remote-compute-target) referenced by FileDatasets to your compute target.
7474

7575
Additional datasets capabilities can be found in the following documentation:
7676

articles/machine-learning/how-to-configure-auto-train.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ For remote executions, training data must be accessible from the remote compute.
105105
* easily transfer data from static files or URL sources into your workspace
106106
* make your data available to training scripts when running on cloud compute resources
107107

108-
See the [how-to](how-to-train-with-datasets.md#option-2--mount-files-to-a-remote-compute-target) for an example of using the `Dataset` class to mount data to your compute target.
108+
See the [how-to](how-to-train-with-datasets.md#mount-files-to-a-remote-compute-target) for an example of using the `Dataset` class to mount data to your compute target.
109109

110110
## Train and validation data
111111

articles/machine-learning/how-to-train-with-datasets.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ To create and train with datasets, you need:
3636
> [!Note]
3737
> Some Dataset classes have dependencies on the [azureml-dataprep](https://docs.microsoft.com/python/api/azureml-dataprep/?view=azure-ml-py) package. For Linux users, these classes are supported only on the following distributions: Red Hat Enterprise Linux, Ubuntu, Fedora, and CentOS.
3838
39-
## Work with datasets
39+
## Access and explore input datasets
4040

41-
You can access existing datasets across experiments within your workspace, and load them into a pandas dataframe for further exploration on your local environment.
41+
You can access an existing TabularDataset from the training script of an experiment on your workspace, and load that dataset into a pandas dataframe for further exploration on your local environment.
4242

4343
The following code uses the [`get_context()`]() method in the [`Run`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py) class to access the existing input TabularDataset, `titanic`, in the training script. Then uses the [`to_pandas_dataframe()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method to load that dataset into a pandas dataframe for further data exploration and preparation prior to training.
4444

@@ -55,7 +55,7 @@ dataset = run.input_datasets['titanic']
5555
df = dataset.to_pandas_dataframe()
5656
```
5757

58-
If you need to load the prepared data into a new dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create a new dataset from that file.
58+
If you need to load the prepared data into a new dataset from an in memory pandas dataframe, write the data to a local file, like a parquet, and create a new dataset from that file. You can also create datasets from local files or paths in datastores. Learn more about [how to create datasets](https://aka.ms/azureml/howto/createdatasets).
5959

6060
## Use datasets directly in training scripts
6161

@@ -65,7 +65,7 @@ In this example, you create a [TabularDataset](https://docs.microsoft.com/python
6565

6666
### Create a TabularDataset
6767

68-
The following code creates an unregistered TabularDataset from a web url. You can also create datasets from local files or paths in datastores. Learn more about [how to create datasets](https://aka.ms/azureml/howto/createdatasets).
68+
The following code creates an unregistered TabularDataset from a web url.
6969

7070
```Python
7171
from azureml.core.dataset import Dataset
@@ -74,7 +74,7 @@ web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
7474
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path)
7575
```
7676

77-
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [work with datasets](#work-with-datasets).
77+
TabularDataset objects provide the ability to load the data into a pandas or spark DataFrame so that you can work with familiar data preparation and training libraries without having to leave your notebook. To leverage this capability, see [access and explore input datasets](#access-and-explore-input-datasets).
7878

7979
### Configure the estimator
8080

@@ -103,7 +103,7 @@ experiment_run.wait_for_completion(show_output=True)
103103

104104
## Mount files to remote compute targets
105105

106-
If you have unstructured data, create a [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and either mount or download your data files to make them available to your remote compute target for training. Learn about when to use [mount vs. download](#mount-vs.-download) for your remote training experiments.
106+
If you have unstructured data, create a [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) and either mount or download your data files to make them available to your remote compute target for training. Learn about when to use [mount vs. download](#mount-vs-download) for your remote training experiments.
107107

108108
The following example creates a FileDataset and mounts the dataset to the compute target by passing it as an argument in the estimator for training.
109109

@@ -186,7 +186,7 @@ y_test = load_data(y_test, True).reshape(-1)
186186
```
187187

188188

189-
## Mount vs. download
189+
## Mount vs download
190190

191191
Mounting or downloading files of any format are supported for datasets created from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL.
192192

0 commit comments

Comments
 (0)