You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article describes what datasets are, how they are defined in JSON format, and how they are used in Azure Data Factory and Synapse pipelines.
23
+
This article describes what datasets are, how they’re defined in JSON format, and how they’re used in Azure Data Factory and Synapse pipelines.
24
24
25
-
If you are new to Data Factory, see [Introduction to Azure Data Factory](introduction.md) for an overview. For more information about Azure Synapse see [What is Azure Synapse](../synapse-analytics/overview-what-is.md)
25
+
If you’re new to Data Factory, see [Introduction to Azure Data Factory](introduction.md) for an overview. For more information about Azure Synapse, see [What is Azure Synapse](../synapse-analytics/overview-what-is.md)
26
26
27
27
## Overview
28
-
A data factory or Synapse workspace can have one or more pipelines. A **pipeline** is a logical grouping of **activities** that together perform a task. The activities in a pipeline define actions to perform on your data. Now, a **dataset** is a named view of data that simply points or references the data you want to use in your **activities** as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
28
+
An Azure Data Factory or Synapse workspace can have one or more pipelines. A **pipeline** is a logical grouping of **activities** that together perform a task. The activities in a pipeline define actions to perform on your data. Now, a **dataset** is a named view of data that simply points or references the data you want to use in your **activities** as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob Storage from which the activity should read the data.
29
29
30
30
Before you create a dataset, you must create a [**linked service**](concepts-linked-services.md) to link your data store to the service. Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources. Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. For example, an Azure Storage linked service links a storage account. An Azure Blob dataset represents the blob container and the folder within that Azure Storage account that contains the input blobs to be processed.
31
31
32
-
Here is a sample scenario. To copy data from Blob storage to a SQL Database, you create two linked services: Azure Blob Storage and Azure SQL Database. Then, create two datasets: Delimited Text dataset (which refers to the Azure Blob Storage linked service, assuming you have text files as source) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). The Azure Blob Storage and Azure SQL Database linked services contain connection strings that the service uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Delimited Text dataset specifies the blob container and blob folder that contains the input blobs in your Blob storage, along with format-related settings. The Azure SQL Table dataset specifies the SQL table in your SQL Database to which the data is to be copied.
32
+
Here’s a sample scenario. To copy data from Blob storage to a SQL Database, you create two linked services: Azure Blob Storage and Azure SQL Database. Then, create two datasets: Delimited Text dataset (which refers to the Azure Blob Storage linked service, assuming you have text files as source) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). The Azure Blob Storage and Azure SQL Database linked services contain connection strings that the service uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Delimited Text dataset specifies the blob container and blob folder that contains the input blobs in your Blob Storage, along with format-related settings. The Azure SQL Table dataset specifies the SQL table in your SQL Database to which the data is to be copied.
33
33
34
34
The following diagram shows the relationships among pipeline, activity, dataset, and linked services:
35
35
36
36
:::image type="content" source="media/concepts-datasets-linked-services/relationship-between-data-factory-entities.png" alt-text="Relationship between pipeline, activity, dataset, linked services":::
37
37
38
+
## Create a dataset with UI
39
+
40
+
# [Azure Data Factory](#tab/data-factory)
41
+
42
+
To create a dataset with the Azure Data Factory Studio, select the Author tab (with the pencil icon), and then the plus sign icon, to choose **Dataset**.
43
+
44
+
:::image type="content" source="media/concepts-datasets-linked-services/create-dataset.png" alt-text="Shows the Author tab of the Azure Data Factory Studio with the new dataset button selected.":::
45
+
46
+
You’ll see the new dataset window to choose any of the connectors available in Azure Data Factory, to set up an existing or new linked service.
47
+
48
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-source.png" alt-text="Shows the new dataset window where you can choose the type of linked service to any of the supported data factory connectors.":::
49
+
50
+
Next you’ll be prompted to choose the dataset format.
51
+
52
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-format.png" alt-text="Shows the dataset format window allowing you to choose a format for the new dataset.":::
53
+
54
+
Finally, you can choose an existing linked service of the type you selected for the dataset, or create a new one if one isn’t already defined.
55
+
56
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-or-define-linked-service.png" alt-text="Shows the set properties window where you can choose an existing dataset of the type selected previously, or create a new one.":::
57
+
58
+
Once you create the dataset, you can use it within any pipelines in the Azure Data Factory.
59
+
60
+
# [Synapse Analytics](#tab/synapse-analytics)
61
+
62
+
To create a dataset with the Synapse Studio, select the Data tab, and then the plus sign icon, to choose **Integration dataset**.
63
+
64
+
:::image type="content" source="media/concepts-datasets-linked-services/create-dataset-synapse.png" alt-text="Shows the Author tab of Synapse Studio with the new integration dataset button selected.":::
65
+
66
+
You’ll see the new integration dataset window to choose any of the connectors available in Azure Synapse, to set up an existing or new linked service.
67
+
68
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-source-synapse.png" alt-text="Shows the new integration dataset window where you can choose the type of linked service to any of the supported Azure Synapse connectors.":::
69
+
70
+
Next you’ll be prompted to choose the dataset format.
71
+
72
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-format.png" alt-text="Shows the dataset format window allowing you to choose a format for the new dataset.":::
73
+
74
+
Finally, you can choose an existing linked service of the type you selected for the dataset, or create a new one if one isn’t already defined.
75
+
76
+
:::image type="content" source="media/concepts-datasets-linked-services/choose-or-define-linked-service.png" alt-text="Shows the set properties window where you can choose an existing dataset of the type selected previously, or create a new one.":::
77
+
78
+
Once you create the dataset, you can use it within any pipelines within the Synapse workspace.
79
+
80
+
---
38
81
39
82
## Dataset JSON
40
83
A dataset is defined in the following JSON format:
@@ -75,7 +118,7 @@ In Data Flow, datasets are used in source and sink transformations. The datasets
75
118
76
119
## Dataset type
77
120
78
-
The service supports many different types of datasets, depending on the data stores you use. You can find the list of supported data stores from [Connector overview](connector-overview.md) article. Click a data store to learn how to create a linked service and a dataset for it.
121
+
The service supports many different types of datasets, depending on the data stores you use. You can find the list of supported data stores from [Connector overview](connector-overview.md) article. Select a data store to learn how to create a linked service and a dataset for it.
79
122
80
123
For example, for a Delimited Text dataset, the dataset type is set to **DelimitedText** as shown in the following JSON sample:
81
124
@@ -112,9 +155,9 @@ You can create datasets by using one of these tools or SDKs: [.NET API](quicksta
112
155
113
156
Here are some differences between datasets in Data Factory current version (and Azure Synapse), and the legacy Data Factory version 1:
114
157
115
-
- The external property is not supported in the current version. It's replaced by a [trigger](concepts-pipeline-execution-triggers.md).
116
-
- The policy and availability properties are not supported in the current version. The start time for a pipeline depends on [triggers](concepts-pipeline-execution-triggers.md).
117
-
- Scoped datasets (datasets defined in a pipeline) are not supported in the current version.
158
+
- The external property isn’t supported in the current version. It's replaced by a [trigger](concepts-pipeline-execution-triggers.md).
159
+
- The policy and availability properties aren’t supported in the current version. The start time for a pipeline depends on [triggers](concepts-pipeline-execution-triggers.md).
160
+
- Scoped datasets (datasets defined in a pipeline) aren’t supported in the current version.
118
161
119
162
## Next steps
120
163
See the following tutorial for step-by-step instructions for creating pipelines and datasets by using one of these tools or SDKs.
0 commit comments