Skip to content

Commit 1a8dccc

Browse files
committed
Acrolinx and pattern compliance
1 parent 1415a8b commit 1a8dccc

File tree

1 file changed

+24
-25
lines changed

1 file changed

+24
-25
lines changed

articles/synapse-analytics/quickstart-data-flow.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: "Quickstart: Transform data using a mapping data flow"
2+
title: Quickstart: Transform data using a mapping data flow
33
description: This tutorial provides step-by-step instructions for using Azure Synapse Analytics to transform data with mapping data flow.
44
author: kromerm
55
ms.author: makromer
66
ms.reviewer: makromer
77
ms.service: azure-synapse-analytics
88
ms.subservice: pipeline
9-
ms.topic: conceptual
9+
ms.topic: quickstart
1010
ms.date: 02/15/2022
1111
---
1212

@@ -28,7 +28,7 @@ In this quickstart, you do the following steps:
2828
* **Azure Synapse workspace**: Create a Synapse workspace using the Azure portal following the instructions in [Quickstart: Create a Synapse workspace](quickstart-create-workspace.md).
2929
* **Azure storage account**: You use ADLS storage as *source* and *sink* data stores. If you don't have a storage account, see [Create an Azure storage account](../storage/common/storage-account-create.md) for steps to create one.
3030

31-
The file that we are transforming in this tutorial is MoviesDB.csv, which can be found [here](https://raw.githubusercontent.com/djpmsft/adf-ready-demo/master/moviesDB.csv). To retrieve the file from GitHub, copy the contents to a text editor of your choice to save locally as a .csv file. To upload the file to your storage account, see [Upload blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md). The examples will be referencing a container named 'sample-data'.
31+
The file that we're transforming in this tutorial is MoviesDB.csv, which can be found [here](https://raw.githubusercontent.com/djpmsft/adf-ready-demo/master/moviesDB.csv). To retrieve the file from GitHub, copy the contents to a text editor of your choice to save locally as a .csv file. To upload the file to your storage account, see [Upload blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md). The examples will be referencing a container named 'sample-data'.
3232

3333
### Navigate to the Synapse Studio
3434

@@ -45,15 +45,15 @@ In this quickstart, we use the workspace named "adftest2020" as an example. It w
4545

4646
A pipeline contains the logical flow for an execution of a set of activities. In this section, you'll create a pipeline that contains a Data Flow activity.
4747

48-
1. Go to the **Integrate** tab. Select on the plus icon next to the pipelines header and select Pipeline.
48+
1. Go to the **Integrate** tab. Select the plus icon next to the pipelines header and select Pipeline.
4949

5050
![Create a new pipeline](media/doc-common-process/new-pipeline.png)
5151

5252
1. In the **Properties** settings page of the pipeline, enter **TransformMovies** for **Name**.
5353

5454
1. Under *Move and Transform* in the *Activities* pane, drag **Data flow** onto the pipeline canvas.
5555

56-
1. In the **Adding data flow** page pop-up, select **Create new data flow** -> **Data flow**. Click **OK** when done.
56+
1. In the **Adding data flow** page pop-up, select **Create new data flow** -> **Data flow**. Select **OK** when done.
5757

5858
![Create a data flow](media/quickstart-data-flow/new-data-flow.png)
5959

@@ -69,35 +69,35 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
6969

7070
1. In the data flow canvas, add a source by clicking on the **Add Source** box.
7171

72-
1. Name your source **MoviesDB**. Click on **New** to create a new source dataset.
72+
1. Name your source **MoviesDB**. Select **New** to create a new source dataset.
7373

7474
![Create a new source dataset](media/quickstart-data-flow/new-source-dataset.png)
7575

76-
1. Choose **Azure Data Lake Storage Gen2**. Click Continue.
76+
1. Choose **Azure Data Lake Storage Gen2**. Select Continue.
7777

7878
![Choose Azure Data Lake Storage Gen2](media/quickstart-data-flow/select-source-dataset.png)
7979

80-
1. Choose **DelimitedText**. Click Continue.
80+
1. Choose **DelimitedText**. Select Continue.
8181

8282
1. Name your dataset **MoviesDB**. In the linked service dropdown, choose **New**.
8383

84-
1. In the linked service creation screen, name your ADLS Gen2 linked service **ADLSGen2** and specify your authentication method. Then enter your connection credentials. In this quickstart, we're using Account key to connect to our storage account. You can click **Test connection** to verify your credentials were entered correctly. Click **Create** when finished.
84+
1. In the linked service creation screen, name your ADLS Gen2 linked service **ADLSGen2** and specify your authentication method. Then enter your connection credentials. In this quickstart, we're using Account key to connect to our storage account. You can select **Test connection** to verify your credentials were entered correctly. Select **Create** when finished.
8585

8686
![Create a source linked service](media/quickstart-data-flow/adls-gen2-linked-service.png)
8787

88-
1. Once you're back at the dataset creation screen, under the **File path** field, enter where your file is located. In this quickstart, the file "MoviesDB.csv" is located in container "sample-data". As the file has headers, check **First row as header**. Select **From connection/store** to import the header schema directly from the file in storage. Click **OK** when done.
88+
1. Once you're back at the dataset creation screen, under the **File path** field, enter where your file is located. In this quickstart, the file "MoviesDB.csv" is located in container "sample-data". As the file has headers, check **First row as header**. Select **From connection/store** to import the header schema directly from the file in storage. Select **OK** when done.
8989

9090
![Source dataset settings](media/quickstart-data-flow/source-dataset-properties.png)
9191

92-
1. If your debug cluster has started, go to the **Data Preview** tab of the source transformation and click **Refresh** to get a snapshot of the data. You can use data preview to verify your transformation is configured correctly.
92+
1. If your debug cluster has started, go to the **Data Preview** tab of the source transformation and select **Refresh** to get a snapshot of the data. You can use data preview to verify your transformation is configured correctly.
9393

9494
![Data preview](media/quickstart-data-flow/data-preview.png)
9595

96-
1. Next to your source node on the data flow canvas, click on the plus icon to add a new transformation. The first transformation you're adding is a **Filter**.
96+
1. Next to your source node on the data flow canvas, select the plus icon to add a new transformation. The first transformation you're adding is a **Filter**.
9797

9898
![Add a filter](media/quickstart-data-flow/add-filter.png)
9999

100-
1. Name your filter transformation **FilterYears**. Click on the expression box next to **Filter on** to open the expression builder. Here you'll specify your filtering condition.
100+
1. Name your filter transformation **FilterYears**. Select the expression box next to **Filter on** to open the expression builder. Here you'll specify your filtering condition.
101101

102102
1. The data flow expression builder lets you interactively build expressions to use in various transformations. Expressions can include built-in functions, columns from the input schema, and user-defined parameters. For more information on how to build expressions, see [Data Flow expression builder](../data-factory/concepts-data-flow-expression-builder.md?toc=%2fazure%2fsynapse-analytics%2ftoc.json).
103103

@@ -111,9 +111,9 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
111111

112112
![Specify filtering condition](media/quickstart-data-flow/visual-expression-builder.png)
113113

114-
If you've a debug cluster active, you can verify your logic by clicking **Refresh** to see expression output compared to the inputs used. There's more than one right answer on how you can accomplish this logic using the data flow expression language.
114+
If you have a debug cluster active, you can verify your logic by clicking **Refresh** to see expression output compared to the inputs used. There's more than one right answer on how you can accomplish this logic using the data flow expression language.
115115

116-
Click **Save and Finish** once you're done with your expression.
116+
Select **Save and Finish** once you're done with your expression.
117117

118118
1. Fetch a **Data Preview** to verify the filter is working correctly.
119119

@@ -125,15 +125,15 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
125125

126126
![Aggregate settings 1](media/quickstart-data-flow/aggregate-settings.png)
127127

128-
1. Go to the **Aggregates** tab. In the left text box, name the aggregate column **AverageComedyRating**. Click on the right expression box to enter the aggregate expression via the expression builder.
128+
1. Go to the **Aggregates** tab. In the left text box, name the aggregate column **AverageComedyRating**. Select the right expression box to enter the aggregate expression via the expression builder.
129129

130130
![Aggregate settings 2](media/quickstart-data-flow/aggregate-settings-2.png)
131131

132132
1. To get the average of column **Rating**, use the ```avg()``` aggregate function. As **Rating** is a string and ```avg()``` takes in a numerical input, we must convert the value to a number via the ```toInteger()``` function. This expression looks like:
133133

134134
`avg(toInteger(Rating))`
135135

136-
Click **Save and Finish** when done.
136+
Select **Save and Finish** when done.
137137

138138
![Average comedy rating](media/quickstart-data-flow/average-comedy-rating.png)
139139

@@ -145,13 +145,13 @@ Once you create your Data Flow, you'll be automatically sent to the data flow ca
145145

146146
![Add a Sink](media/quickstart-data-flow/add-sink.png)
147147

148-
1. Name your sink **Sink**. Click **New** to create your sink dataset.
148+
1. Name your sink **Sink**. Select **New** to create your sink dataset.
149149

150-
1. Choose **Azure Data Lake Storage Gen2**. Click Continue.
150+
1. Choose **Azure Data Lake Storage Gen2**. Select Continue.
151151

152-
1. Choose **DelimitedText**. Click Continue.
152+
1. Choose **DelimitedText**. Select Continue.
153153

154-
1. Name your sink dataset **MoviesSink**. For linked service, choose the ADLS Gen2 linked service you created in step 7. Enter an output folder to write your data to. In this quickstart, we're writing to folder 'output' in container 'sample-data'. The folder doesn't need to exist beforehand and can be dynamically created. Set **First row as header** as true and select **None** for **Import schema**. Click **OK** when done.
154+
1. Name your sink dataset **MoviesSink**. For linked service, choose the ADLS Gen2 linked service you created in step 7. Enter an output folder to write your data to. In this quickstart, we're writing to folder 'output' in container 'sample-data'. The folder doesn't need to exist beforehand and can be dynamically created. Set **First row as header** as true and select **None** for **Import schema**. Select **OK** when done.
155155

156156
![Sink dataset properties](media/quickstart-data-flow/sink-dataset-properties.png)
157157

@@ -161,25 +161,24 @@ Now you've finished building your data flow. You're ready to run it in your pipe
161161

162162
You can debug a pipeline before you publish it. In this step, you're going to trigger a debug run of the data flow pipeline. While data preview doesn't write data, a debug run will write data to your sink destination.
163163

164-
1. Go to the pipeline canvas. Click **Debug** to trigger a debug run.
164+
1. Go to the pipeline canvas. Select **Debug** to trigger a debug run.
165165

166166
![Debug pipeline](media/quickstart-data-flow/debug-pipeline.png)
167167

168-
1. Pipeline debug of Data Flow activities uses the active debug cluster but still take at least a minute to initialize. You can track the progress via the **Output** tab. Once the run is successful, click on the eyeglasses icon to open the monitoring pane.
168+
1. Pipeline debug of Data Flow activities uses the active debug cluster but still take at least a minute to initialize. You can track the progress via the **Output** tab. Once the run is successful, select the eyeglasses icon to open the monitoring pane.
169169

170170
![Debugging output](media/quickstart-data-flow/debugging-output.png)
171171

172172
1. In the monitoring pane, you can see the number of rows and time spent in each transformation step.
173173

174174
![Transformation monitoring](media/quickstart-data-flow/4-transformations.png)
175175

176-
1. Click on a transformation to get detailed information about the columns and partitioning of the data.
176+
1. Select a transformation to get detailed information about the columns and partitioning of the data.
177177

178178
![Transformation details](media/quickstart-data-flow/transformation-details.png)
179179

180180
If you followed this quickstart correctly, you should have written 83 rows and 2 columns into your sink folder. You can verify the data by checking your blob storage.
181181

182-
183182
## Next steps
184183

185184
Advance to the following articles to learn about Azure Synapse Analytics support:

0 commit comments

Comments
 (0)