|
| 1 | +--- |
| 2 | +title: Import data |
| 3 | +titleSuffix: Azure Machine Learning |
| 4 | +description: Learn how to import your data into Azure Machine Learning designer from various data sources. |
| 5 | +services: machine-learning |
| 6 | +ms.service: machine-learning |
| 7 | +ms.subservice: core |
| 8 | +ms.topic: how-to |
| 9 | + |
| 10 | +author: peterclu |
| 11 | +ms.author: peterlu |
| 12 | +ms.date: 01/06/2020 |
| 13 | +--- |
| 14 | + |
| 15 | +# Import your data into Azure Machine Learning designer (preview) |
| 16 | + |
| 17 | +You can use your own data in Azure Machine Learning designer to create predictive analytics solutions. You can import data into the designer in one of two ways: |
| 18 | + |
| 19 | +* **Azure Machine Learning datasets** - Register [datasets](concept-data.md#datasets) in Azure Machine Learning to help you manage datasets and use advanced features. |
| 20 | +* **Import Data module** - Use the [Import Data](algorithm-module-reference/import-data.md) module to directly access data from online datasources. |
| 21 | + |
| 22 | +To learn more about the differences between datasets and datastores, see [Data access in Azure Machine Learning](concept-data.md). |
| 23 | + |
| 24 | +## Import data using datasets |
| 25 | + |
| 26 | +We recommend that you use [Azure Machine Learning datasets](concept-data.md#datasets) when you import data into the designer. When you register a dataset in Azure Machine Learning, you can take full advantage of advanced features like [versioning and tracking](how-to-version-track-datasets.md) and [data monitoring](how-to-monitor-datasets.md) to accelerate your machine learning workflows. |
| 27 | + |
| 28 | + |
| 29 | +### Register a dataset |
| 30 | + |
| 31 | +Register a dataset [programatically with the SDK](how-to-create-register-datasets.md#use-the-sdk) or [visually in Azure Machine Learning studio](how-to-create-register-datasets.md#use-the-ui). |
| 32 | + |
| 33 | +You can also register the output for any module as a dataset directly in the designer. |
| 34 | + |
| 35 | +1. Select the module that outputs the data you want to register. |
| 36 | + |
| 37 | +1. In the properties pane, select **Outputs** > **Register dataset**. |
| 38 | + |
| 39 | +  |
| 40 | + |
| 41 | +### Use datasets |
| 42 | + |
| 43 | +Any dataset registered to your workspace will appear, you aren't limited to datasets created in the designer. |
| 44 | + |
| 45 | +> [!NOTE] |
| 46 | +> The designer currently only supports processing [tabular datasets](how-to-create-register-datasets.md#dataset-types). For other datasets which need [file datasets](how-to-create-register-datasets.md#dataset-types), use the Azure Machine Learning SDK available for Python or R. |
| 47 | +
|
| 48 | +Registered datasets can be found in the module palette, under **Datasets** > **My Datasets**. To use a dataset, drag and drop the dataset onto the pipeline canvas. Then, connect the output port of the dataset to other modules in the palette. |
| 49 | + |
| 50 | + |
| 51 | + |
| 52 | +## Import data using the Import Data module |
| 53 | + |
| 54 | +You can also use the [Import Data](algorithm-module-reference/import-data.md) module to import data directly from Azure Machine Learning [datastores](concept-data.md#datastores) or HTTP URLs. However, we recommend you create a dataset first to take full advantage of features such as versioning and monitoring. |
| 55 | + |
| 56 | +> [!NOTE] |
| 57 | +> Pipelines converted from the visual interface will default to the **Import Data** module. If you are using a converted visual interface pipeline, we recommend creating a dataset and importing data via the dataset method. |
| 58 | +
|
| 59 | +### Create a new datastore |
| 60 | + |
| 61 | +Creating a datastore can be done [programatically with the SDK](how-to-access-data.md#create-and-register-datastores) or [visually in Azure Machine Learning studio](how-to-access-data.md#azure-machine-learning-studio). |
| 62 | + |
| 63 | +You can also create a datastore directly the designer through the **Import Data** module. |
| 64 | + |
| 65 | +1. Drag and drop an **Import Data** module to the pipeline canvas. |
| 66 | +1. Select the **Import Data** module. |
| 67 | +1. In the properties pane, select **New datastore** |
| 68 | +1. Select the datastore type. |
| 69 | +1. Provide valid authentication. |
| 70 | + |
| 71 | + > [!NOTE] |
| 72 | + > You may be asked for different authentication information depending on the type of datasource you are connecting to. |
| 73 | +
|
| 74 | +### Import Data |
| 75 | + |
| 76 | +For more information on how to use the Import Data module, see its [algorithm module reference page](algorithm-module-reference/import-data.md). |
| 77 | + |
| 78 | + |
| 79 | +## Supported data sources |
| 80 | + |
| 81 | +The designer supports the following datasources: |
| 82 | + |
| 83 | +* Azure Blob Container |
| 84 | +* Azure File Share |
| 85 | +* Azure Data Lake |
| 86 | +* Azure Data Lake Gen2 |
| 87 | +* Azure SQL Database |
| 88 | +* Azure Database for PostgreSQL |
| 89 | +* Databricks File System |
| 90 | +* Azure Database for MySQL |
| 91 | +* Local file (TSV, CSV) |
| 92 | +* Web file (TSV, CSV) |
| 93 | + |
| 94 | +If you import data in a format such as ARFF that includes metadata, the designer uses this metadata to define the heading and data type of each column. If you import data such as TSV or CSV format that doesn't include this metadata, the designer infers the data type for each column by sampling the data. |
| 95 | + |
| 96 | +You can explicitly specify or column headings and data types using the [Edit Metadata](algorithm-module-reference/edit-metadata.md) module. |
| 97 | + |
| 98 | +## Supported data types |
| 99 | + |
| 100 | +The designer recognizes the following data types: |
| 101 | + |
| 102 | +* String |
| 103 | +* Integer |
| 104 | +* Decimal |
| 105 | +* Boolean |
| 106 | +* Date |
| 107 | + |
| 108 | +The designer uses an internal data type called ***data table*** to pass data between modules. You can explicitly convert your data into data table format using the [Convert to Dataset][convert-to-dataset] module. |
| 109 | + |
| 110 | +Any module that accepts formats other than data table will convert the data to data table silently before passing it to the next module. |
| 111 | + |
| 112 | +## Data capacities |
| 113 | + |
| 114 | +Modules in Azure Machine Learning designer are limited by the size of the compute target. For larger datasets, you should use a larger Azure Machine Learning compute resource. For more information on Azure Machine Learning compute, see [What are compute targets in Azure Machine Learning?](concept-compute-target.md#azure-machine-learning-compute-managed) |
| 115 | + |
| 116 | +## Next steps |
0 commit comments