Skip to content

Commit 77a40b3

Browse files
committed
Init
1 parent c6d1353 commit 77a40b3

File tree

4 files changed

+117
-1
lines changed

4 files changed

+117
-1
lines changed

articles/machine-learning/algorithm-module-reference/import-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Use this module to load data into a machine learning pipeline from existing clou
2222
> After you register a dataset, you can find it in the **Datasets** -> **My Datasets** category in designer interface. This module is reserved for Studio(classic) users to for a familiar experience.
2323
>
2424
25-
First, choose the source you are reading from, and finish the additional settings. The **Import Data** module support read data from following sources:
25+
The **Import Data** module support read data from following sources:
2626

2727
- URL via HTTP
2828
- Azure cloud storages through [**Datastores**](../how-to-access-data.md))
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
title: Import data
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how to import your data into Azure Machine Learning designer from various data sources.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: core
8+
ms.topic: how-to
9+
10+
author: peterclu
11+
ms.author: peterlu
12+
ms.date: 01/06/2020
13+
---
14+
15+
# Import your data into Azure Machine Learning designer (preview)
16+
17+
You can use your own data in Azure Machine Learning designer to create predictive analytics solutions. You can import data into the designer in one of two ways:
18+
19+
* **Azure Machine Learning datasets** - Register [datasets](concept-data.md#datasets) in Azure Machine Learning to help you manage datasets and use advanced features.
20+
* **Import Data module** - Use the [Import Data](algorithm-module-reference/import-data.md) module to directly access data from online datasources.
21+
22+
To learn more about the differences between datasets and datastores, see [Data access in Azure Machine Learning](concept-data.md).
23+
24+
## Import data using datasets
25+
26+
We recommend that you use [Azure Machine Learning datasets](concept-data.md#datasets) when you import data into the designer. When you register a dataset in Azure Machine Learning, you can take full advantage of advanced features like [versioning and tracking](how-to-version-track-datasets.md) and [data monitoring](how-to-monitor-datasets.md) to accelerate your machine learning workflows.
27+
28+
29+
### Register a dataset
30+
31+
Register a dataset [programatically with the SDK](how-to-create-register-datasets.md#use-the-sdk) or [visually in Azure Machine Learning studio](how-to-create-register-datasets.md#use-the-ui).
32+
33+
You can also register the output for any module as a dataset directly in the designer.
34+
35+
1. Select the module that outputs the data you want to register.
36+
37+
1. In the properties pane, select **Outputs** > **Register dataset**.
38+
39+
![Screenshot showing how to navigate to the Register Dataset option](media/how-to-designer-import-data/register-dataset-designer.png)
40+
41+
### Use datasets
42+
43+
Any dataset registered to your workspace will appear, you aren't limited to datasets created in the designer.
44+
45+
> [!NOTE]
46+
> The designer currently only supports processing [tabular datasets](how-to-create-register-datasets.md#dataset-types). For other datasets which need [file datasets](how-to-create-register-datasets.md#dataset-types), use the Azure Machine Learning SDK available for Python or R.
47+
48+
Registered datasets can be found in the module palette, under **Datasets** > **My Datasets**. To use a dataset, drag and drop the dataset onto the pipeline canvas. Then, connect the output port of the dataset to other modules in the palette.
49+
50+
![Screenshot showing location of saved datasets in the designer palette](media/how-to-designer-import-data/use-datasets-designer.png)
51+
52+
## Import data using the Import Data module
53+
54+
You can also use the [Import Data](algorithm-module-reference/import-data.md) module to import data directly from Azure Machine Learning [datastores](concept-data.md#datastores) or HTTP URLs. However, we recommend you create a dataset first to take full advantage of features such as versioning and monitoring.
55+
56+
> [!NOTE]
57+
> Pipelines converted from the visual interface will default to the **Import Data** module. If you are using a converted visual interface pipeline, we recommend creating a dataset and importing data via the dataset method.
58+
59+
### Create a new datastore
60+
61+
Creating a datastore can be done [programatically with the SDK](how-to-access-data.md#create-and-register-datastores) or [visually in Azure Machine Learning studio](how-to-access-data.md#azure-machine-learning-studio).
62+
63+
You can also create a datastore directly the designer through the **Import Data** module.
64+
65+
1. Drag and drop an **Import Data** module to the pipeline canvas.
66+
1. Select the **Import Data** module.
67+
1. In the properties pane, select **New datastore**
68+
1. Select the datastore type.
69+
1. Provide valid authentication.
70+
71+
> [!NOTE]
72+
> You may be asked for different authentication information depending on the type of datasource you are connecting to.
73+
74+
### Import Data
75+
76+
For more information on how to use the Import Data module, see its [algorithm module reference page](algorithm-module-reference/import-data.md).
77+
78+
79+
## Supported data sources
80+
81+
The designer supports the following datasources:
82+
83+
* Azure Blob Container
84+
* Azure File Share
85+
* Azure Data Lake
86+
* Azure Data Lake Gen2
87+
* Azure SQL Database
88+
* Azure Database for PostgreSQL
89+
* Databricks File System
90+
* Azure Database for MySQL
91+
* Local file (TSV, CSV)
92+
* Web file (TSV, CSV)
93+
94+
If you import data in a format such as ARFF that includes metadata, the designer uses this metadata to define the heading and data type of each column. If you import data such as TSV or CSV format that doesn't include this metadata, the designer infers the data type for each column by sampling the data.
95+
96+
You can explicitly specify or column headings and data types using the [Edit Metadata](algorithm-module-reference/edit-metadata.md) module.
97+
98+
## Supported data types
99+
100+
The designer recognizes the following data types:
101+
102+
* String
103+
* Integer
104+
* Decimal
105+
* Boolean
106+
* Date
107+
108+
The designer uses an internal data type called ***data table*** to pass data between modules. You can explicitly convert your data into data table format using the [Convert to Dataset][convert-to-dataset] module.
109+
110+
Any module that accepts formats other than data table will convert the data to data table silently before passing it to the next module.
111+
112+
## Data capacities
113+
114+
Modules in Azure Machine Learning designer are limited by the size of the compute target. For larger datasets, you should use a larger Azure Machine Learning compute resource. For more information on Azure Machine Learning compute, see [What are compute targets in Azure Machine Learning?](concept-compute-target.md#azure-machine-learning-compute-managed)
115+
116+
## Next steps
74.3 KB
Loading
10.7 KB
Loading

0 commit comments

Comments
 (0)