Skip to content

Commit 759fc5a

Browse files
authored
Merge pull request #186687 from jonburchel/2022-01-28-adds-ui-to-concepts-datasets-linked-services
Adds UI to concepts-datasets-linked-services.md
2 parents aeebd93 + 9efdc9b commit 759fc5a

File tree

7 files changed

+52
-9
lines changed

7 files changed

+52
-9
lines changed

articles/data-factory/concepts-datasets-linked-services.md

Lines changed: 52 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: data-factory
99
ms.subservice: data-movement
1010
ms.topic: conceptual
1111
ms.custom: synapse
12-
ms.date: 09/09/2021
12+
ms.date: 01/28/2022
1313
---
1414

1515
# Datasets in Azure Data Factory and Azure Synapse Analytics
@@ -20,21 +20,64 @@ ms.date: 09/09/2021
2020
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
2121

2222

23-
This article describes what datasets are, how they are defined in JSON format, and how they are used in Azure Data Factory and Synapse pipelines.
23+
This article describes what datasets are, how they’re defined in JSON format, and how they’re used in Azure Data Factory and Synapse pipelines.
2424

25-
If you are new to Data Factory, see [Introduction to Azure Data Factory](introduction.md) for an overview. For more information about Azure Synapse see [What is Azure Synapse](../synapse-analytics/overview-what-is.md)
25+
If you’re new to Data Factory, see [Introduction to Azure Data Factory](introduction.md) for an overview. For more information about Azure Synapse, see [What is Azure Synapse](../synapse-analytics/overview-what-is.md)
2626

2727
## Overview
28-
A data factory or Synapse workspace can have one or more pipelines. A **pipeline** is a logical grouping of **activities** that together perform a task. The activities in a pipeline define actions to perform on your data. Now, a **dataset** is a named view of data that simply points or references the data you want to use in your **activities** as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
28+
An Azure Data Factory or Synapse workspace can have one or more pipelines. A **pipeline** is a logical grouping of **activities** that together perform a task. The activities in a pipeline define actions to perform on your data. Now, a **dataset** is a named view of data that simply points or references the data you want to use in your **activities** as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob Storage from which the activity should read the data.
2929

3030
Before you create a dataset, you must create a [**linked service**](concepts-linked-services.md) to link your data store to the service. Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources. Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. For example, an Azure Storage linked service links a storage account. An Azure Blob dataset represents the blob container and the folder within that Azure Storage account that contains the input blobs to be processed.
3131

32-
Here is a sample scenario. To copy data from Blob storage to a SQL Database, you create two linked services: Azure Blob Storage and Azure SQL Database. Then, create two datasets: Delimited Text dataset (which refers to the Azure Blob Storage linked service, assuming you have text files as source) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). The Azure Blob Storage and Azure SQL Database linked services contain connection strings that the service uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Delimited Text dataset specifies the blob container and blob folder that contains the input blobs in your Blob storage, along with format-related settings. The Azure SQL Table dataset specifies the SQL table in your SQL Database to which the data is to be copied.
32+
Here’s a sample scenario. To copy data from Blob storage to a SQL Database, you create two linked services: Azure Blob Storage and Azure SQL Database. Then, create two datasets: Delimited Text dataset (which refers to the Azure Blob Storage linked service, assuming you have text files as source) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). The Azure Blob Storage and Azure SQL Database linked services contain connection strings that the service uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Delimited Text dataset specifies the blob container and blob folder that contains the input blobs in your Blob Storage, along with format-related settings. The Azure SQL Table dataset specifies the SQL table in your SQL Database to which the data is to be copied.
3333

3434
The following diagram shows the relationships among pipeline, activity, dataset, and linked services:
3535

3636
:::image type="content" source="media/concepts-datasets-linked-services/relationship-between-data-factory-entities.png" alt-text="Relationship between pipeline, activity, dataset, linked services":::
3737

38+
## Create a dataset with UI
39+
40+
# [Azure Data Factory](#tab/data-factory)
41+
42+
To create a dataset with the Azure Data Factory Studio, select the Author tab (with the pencil icon), and then the plus sign icon, to choose **Dataset**.
43+
44+
:::image type="content" source="media/concepts-datasets-linked-services/create-dataset.png" alt-text="Shows the Author tab of the Azure Data Factory Studio with the new dataset button selected.":::
45+
46+
You’ll see the new dataset window to choose any of the connectors available in Azure Data Factory, to set up an existing or new linked service.
47+
48+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-source.png" alt-text="Shows the new dataset window where you can choose the type of linked service to any of the supported data factory connectors.":::
49+
50+
Next you’ll be prompted to choose the dataset format.
51+
52+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-format.png" alt-text="Shows the dataset format window allowing you to choose a format for the new dataset.":::
53+
54+
Finally, you can choose an existing linked service of the type you selected for the dataset, or create a new one if one isn’t already defined.
55+
56+
:::image type="content" source="media/concepts-datasets-linked-services/choose-or-define-linked-service.png" alt-text="Shows the set properties window where you can choose an existing dataset of the type selected previously, or create a new one.":::
57+
58+
Once you create the dataset, you can use it within any pipelines in the Azure Data Factory.
59+
60+
# [Synapse Analytics](#tab/synapse-analytics)
61+
62+
To create a dataset with the Synapse Studio, select the Data tab, and then the plus sign icon, to choose **Integration dataset**.
63+
64+
:::image type="content" source="media/concepts-datasets-linked-services/create-dataset-synapse.png" alt-text="Shows the Author tab of Synapse Studio with the new integration dataset button selected.":::
65+
66+
You’ll see the new integration dataset window to choose any of the connectors available in Azure Synapse, to set up an existing or new linked service.
67+
68+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-source-synapse.png" alt-text="Shows the new integration dataset window where you can choose the type of linked service to any of the supported Azure Synapse connectors.":::
69+
70+
Next you’ll be prompted to choose the dataset format.
71+
72+
:::image type="content" source="media/concepts-datasets-linked-services/choose-dataset-format.png" alt-text="Shows the dataset format window allowing you to choose a format for the new dataset.":::
73+
74+
Finally, you can choose an existing linked service of the type you selected for the dataset, or create a new one if one isn’t already defined.
75+
76+
:::image type="content" source="media/concepts-datasets-linked-services/choose-or-define-linked-service.png" alt-text="Shows the set properties window where you can choose an existing dataset of the type selected previously, or create a new one.":::
77+
78+
Once you create the dataset, you can use it within any pipelines within the Synapse workspace.
79+
80+
---
3881

3982
## Dataset JSON
4083
A dataset is defined in the following JSON format:
@@ -75,7 +118,7 @@ In Data Flow, datasets are used in source and sink transformations. The datasets
75118

76119
## Dataset type
77120

78-
The service supports many different types of datasets, depending on the data stores you use. You can find the list of supported data stores from [Connector overview](connector-overview.md) article. Click a data store to learn how to create a linked service and a dataset for it.
121+
The service supports many different types of datasets, depending on the data stores you use. You can find the list of supported data stores from [Connector overview](connector-overview.md) article. Select a data store to learn how to create a linked service and a dataset for it.
79122

80123
For example, for a Delimited Text dataset, the dataset type is set to **DelimitedText** as shown in the following JSON sample:
81124

@@ -112,9 +155,9 @@ You can create datasets by using one of these tools or SDKs: [.NET API](quicksta
112155

113156
Here are some differences between datasets in Data Factory current version (and Azure Synapse), and the legacy Data Factory version 1:
114157

115-
- The external property is not supported in the current version. It's replaced by a [trigger](concepts-pipeline-execution-triggers.md).
116-
- The policy and availability properties are not supported in the current version. The start time for a pipeline depends on [triggers](concepts-pipeline-execution-triggers.md).
117-
- Scoped datasets (datasets defined in a pipeline) are not supported in the current version.
158+
- The external property isn’t supported in the current version. It's replaced by a [trigger](concepts-pipeline-execution-triggers.md).
159+
- The policy and availability properties aren’t supported in the current version. The start time for a pipeline depends on [triggers](concepts-pipeline-execution-triggers.md).
160+
- Scoped datasets (datasets defined in a pipeline) aren’t supported in the current version.
118161

119162
## Next steps
120163
See the following tutorial for step-by-step instructions for creating pipelines and datasets by using one of these tools or SDKs.
29.9 KB
Loading
46.6 KB
Loading
45.1 KB
Loading
15.7 KB
Loading
46.7 KB
Loading
45.2 KB
Loading

0 commit comments

Comments
 (0)