You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-data-flow.md
+24-25Lines changed: 24 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
-
title: "Quickstart: Transform data using a mapping data flow"
2
+
title: Quickstart: Transform data using a mapping data flow
3
3
description: This tutorial provides step-by-step instructions for using Azure Synapse Analytics to transform data with mapping data flow.
4
4
author: kromerm
5
5
ms.author: makromer
6
6
ms.reviewer: makromer
7
7
ms.service: azure-synapse-analytics
8
8
ms.subservice: pipeline
9
-
ms.topic: conceptual
9
+
ms.topic: quickstart
10
10
ms.date: 02/15/2022
11
11
---
12
12
@@ -28,7 +28,7 @@ In this quickstart, you do the following steps:
28
28
***Azure Synapse workspace**: Create a Synapse workspace using the Azure portal following the instructions in [Quickstart: Create a Synapse workspace](quickstart-create-workspace.md).
29
29
***Azure storage account**: You use ADLS storage as *source* and *sink* data stores. If you don't have a storage account, see [Create an Azure storage account](../storage/common/storage-account-create.md) for steps to create one.
30
30
31
-
The file that we are transforming in this tutorial is MoviesDB.csv, which can be found [here](https://raw.githubusercontent.com/djpmsft/adf-ready-demo/master/moviesDB.csv). To retrieve the file from GitHub, copy the contents to a text editor of your choice to save locally as a .csv file. To upload the file to your storage account, see [Upload blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md). The examples will be referencing a container named 'sample-data'.
31
+
The file that we're transforming in this tutorial is MoviesDB.csv, which can be found [here](https://raw.githubusercontent.com/djpmsft/adf-ready-demo/master/moviesDB.csv). To retrieve the file from GitHub, copy the contents to a text editor of your choice to save locally as a .csv file. To upload the file to your storage account, see [Upload blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md). The examples will be referencing a container named 'sample-data'.
32
32
33
33
### Navigate to the Synapse Studio
34
34
@@ -45,15 +45,15 @@ In this quickstart, we use the workspace named "adftest2020" as an example. It w
45
45
46
46
A pipeline contains the logical flow for an execution of a set of activities. In this section, you'll create a pipeline that contains a Data Flow activity.
47
47
48
-
1. Go to the **Integrate** tab. Select on the plus icon next to the pipelines header and select Pipeline.
48
+
1. Go to the **Integrate** tab. Select the plus icon next to the pipelines header and select Pipeline.
49
49
50
50

51
51
52
52
1. In the **Properties** settings page of the pipeline, enter **TransformMovies** for **Name**.
53
53
54
54
1. Under *Move and Transform* in the *Activities* pane, drag **Data flow** onto the pipeline canvas.
55
55
56
-
1. In the **Adding data flow** page pop-up, select **Create new data flow** -> **Data flow**. Click**OK** when done.
56
+
1. In the **Adding data flow** page pop-up, select **Create new data flow** -> **Data flow**. Select**OK** when done.
57
57
58
58

59
59
@@ -69,35 +69,35 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
69
69
70
70
1. In the data flow canvas, add a source by clicking on the **Add Source** box.
71
71
72
-
1. Name your source **MoviesDB**. Click on**New** to create a new source dataset.
72
+
1. Name your source **MoviesDB**. Select**New** to create a new source dataset.
73
73
74
74

75
75
76
-
1. Choose **Azure Data Lake Storage Gen2**. Click Continue.
76
+
1. Choose **Azure Data Lake Storage Gen2**. Select Continue.
77
77
78
78

79
79
80
-
1. Choose **DelimitedText**. Click Continue.
80
+
1. Choose **DelimitedText**. Select Continue.
81
81
82
82
1. Name your dataset **MoviesDB**. In the linked service dropdown, choose **New**.
83
83
84
-
1. In the linked service creation screen, name your ADLS Gen2 linked service **ADLSGen2** and specify your authentication method. Then enter your connection credentials. In this quickstart, we're using Account key to connect to our storage account. You can click**Test connection** to verify your credentials were entered correctly. Click**Create** when finished.
84
+
1. In the linked service creation screen, name your ADLS Gen2 linked service **ADLSGen2** and specify your authentication method. Then enter your connection credentials. In this quickstart, we're using Account key to connect to our storage account. You can select**Test connection** to verify your credentials were entered correctly. Select**Create** when finished.
85
85
86
86

87
87
88
-
1. Once you're back at the dataset creation screen, under the **File path** field, enter where your file is located. In this quickstart, the file "MoviesDB.csv" is located in container "sample-data". As the file has headers, check **First row as header**. Select **From connection/store** to import the header schema directly from the file in storage. Click**OK** when done.
88
+
1. Once you're back at the dataset creation screen, under the **File path** field, enter where your file is located. In this quickstart, the file "MoviesDB.csv" is located in container "sample-data". As the file has headers, check **First row as header**. Select **From connection/store** to import the header schema directly from the file in storage. Select**OK** when done.
1. If your debug cluster has started, go to the **Data Preview** tab of the source transformation and click**Refresh** to get a snapshot of the data. You can use data preview to verify your transformation is configured correctly.
92
+
1. If your debug cluster has started, go to the **Data Preview** tab of the source transformation and select**Refresh** to get a snapshot of the data. You can use data preview to verify your transformation is configured correctly.
1. Next to your source node on the data flow canvas, click on the plus icon to add a new transformation. The first transformation you're adding is a **Filter**.
96
+
1. Next to your source node on the data flow canvas, select the plus icon to add a new transformation. The first transformation you're adding is a **Filter**.
97
97
98
98

99
99
100
-
1. Name your filter transformation **FilterYears**. Click on the expression box next to **Filter on** to open the expression builder. Here you'll specify your filtering condition.
100
+
1. Name your filter transformation **FilterYears**. Select the expression box next to **Filter on** to open the expression builder. Here you'll specify your filtering condition.
101
101
102
102
1. The data flow expression builder lets you interactively build expressions to use in various transformations. Expressions can include built-in functions, columns from the input schema, and user-defined parameters. For more information on how to build expressions, see [Data Flow expression builder](../data-factory/concepts-data-flow-expression-builder.md?toc=%2fazure%2fsynapse-analytics%2ftoc.json).
103
103
@@ -111,9 +111,9 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
If you've a debug cluster active, you can verify your logic by clicking **Refresh** to see expression output compared to the inputs used. There's more than one right answer on how you can accomplish this logic using the data flow expression language.
114
+
If you have a debug cluster active, you can verify your logic by clicking **Refresh** to see expression output compared to the inputs used. There's more than one right answer on how you can accomplish this logic using the data flow expression language.
115
115
116
-
Click**Save and Finish** once you're done with your expression.
116
+
Select**Save and Finish** once you're done with your expression.
117
117
118
118
1. Fetch a **Data Preview** to verify the filter is working correctly.
119
119
@@ -125,15 +125,15 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
1. Go to the **Aggregates** tab. In the left text box, name the aggregate column **AverageComedyRating**. Click on the right expression box to enter the aggregate expression via the expression builder.
128
+
1. Go to the **Aggregates** tab. In the left text box, name the aggregate column **AverageComedyRating**. Select the right expression box to enter the aggregate expression via the expression builder.
1. To get the average of column **Rating**, use the ```avg()``` aggregate function. As **Rating** is a string and ```avg()``` takes in a numerical input, we must convert the value to a number via the ```toInteger()``` function. This expression looks like:
@@ -145,13 +145,13 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
145
145
146
146

147
147
148
-
1. Name your sink **Sink**. Click**New** to create your sink dataset.
148
+
1. Name your sink **Sink**. Select**New** to create your sink dataset.
149
149
150
-
1. Choose **Azure Data Lake Storage Gen2**. Click Continue.
150
+
1. Choose **Azure Data Lake Storage Gen2**. Select Continue.
151
151
152
-
1. Choose **DelimitedText**. Click Continue.
152
+
1. Choose **DelimitedText**. Select Continue.
153
153
154
-
1. Name your sink dataset **MoviesSink**. For linked service, choose the ADLS Gen2 linked service you created in step 7. Enter an output folder to write your data to. In this quickstart, we're writing to folder 'output' in container 'sample-data'. The folder doesn't need to exist beforehand and can be dynamically created. Set **First row as header** as true and select **None** for **Import schema**. Click**OK** when done.
154
+
1. Name your sink dataset **MoviesSink**. For linked service, choose the ADLS Gen2 linked service you created in step 7. Enter an output folder to write your data to. In this quickstart, we're writing to folder 'output' in container 'sample-data'. The folder doesn't need to exist beforehand and can be dynamically created. Set **First row as header** as true and select **None** for **Import schema**. Select**OK** when done.
@@ -161,25 +161,24 @@ Now you've finished building your data flow. You're ready to run it in your pipe
161
161
162
162
You can debug a pipeline before you publish it. In this step, you're going to trigger a debug run of the data flow pipeline. While data preview doesn't write data, a debug run will write data to your sink destination.
163
163
164
-
1. Go to the pipeline canvas. Click**Debug** to trigger a debug run.
164
+
1. Go to the pipeline canvas. Select**Debug** to trigger a debug run.
1. Pipeline debug of Data Flow activities uses the active debug cluster but still take at least a minute to initialize. You can track the progress via the **Output** tab. Once the run is successful, click on the eyeglasses icon to open the monitoring pane.
168
+
1. Pipeline debug of Data Flow activities uses the active debug cluster but still take at least a minute to initialize. You can track the progress via the **Output** tab. Once the run is successful, select the eyeglasses icon to open the monitoring pane.
If you followed this quickstart correctly, you should have written 83 rows and 2 columns into your sink folder. You can verify the data by checking your blob storage.
181
181
182
-
183
182
## Next steps
184
183
185
184
Advance to the following articles to learn about Azure Synapse Analytics support:
0 commit comments