You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/tutorial-data-flow-adventure-works-retail-template.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.date: 09/26/2022
14
14
This document explains how to setup and use Microsoft's AdventureWorks pipeline template to jump start the exploration of the AdventureWorks dataset using Azure Synapse Analytics and the Retail database template.
15
15
16
16
## Overview
17
-
AdventureWorks is a fictional sports equipment retailer that is used to demo Microsoft applications. In this case, they are being used as an example for how to use Synapse Pipelines to map retail data to the Retail database template for further analysis within Azure Synapse.
17
+
AdventureWorks is a fictional sports equipment retailer that is used to demo Microsoft applications. In this case, they're being used as an example for how to use Synapse Pipelines to map retail data to the Retail database template for further analysis within Azure Synapse.
18
18
19
19
## Prerequisites
20
20
@@ -34,7 +34,7 @@ Follow these steps to locate the template.
34
34
These steps open the template overview page.
35
35
36
36
## Configure the template
37
-
The template is designed to require minimal configuration. From the template overview page you can see a preview of the initial starting configuration of the pipeline, and click**Open pipeline** to create the resources in your own workspace. You will get a notification that all 31 resources in the template have been created, and can review these before committing or publishing them. You will find the below components of the template:
37
+
The template is designed to require minimal configuration. From the template overview page you can see a preview of the initial starting configuration of the pipeline, and select**Open pipeline** to create the resources in your own workspace. You'll get a notification that all 31 resources in the template have been created, and can review these before committing or publishing them. You'll find the below components of the template:
38
38
39
39
* 17 pipelines: These are scheduled to ensure the data loads into the target tables correctly, and include one pipeline per source table plus the scheduling ones.
40
40
* 14 data flows: These contain the logic to load from the source system and land the data into the target database.
@@ -43,15 +43,15 @@ If you have the AdventureWorks dataset loaded into a different database, you can
43
43
44
44
45
45
## Dataset and source/target models
46
-
The AdventureWorks dataset in Excel format can be downloaded from this [GitHub site](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorks%20Data.zip). In addition, you can access the [schema definition for both the source and target databases](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorksSchemas.xlsx). Using the database designer in Synapse, recreate the source and target databases with the schema in the Excel you downloaded earlier. For more details on the database designer, see this [documentation](../synapse-analytics/database-designer/concepts-database-templates.md).
46
+
The AdventureWorks dataset in Excel format can be downloaded from this [GitHub site](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorks%20Data.zip). In addition, you can access the [schema definition for both the source and target databases](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorksSchemas.xlsx). Using the database designer in Synapse, recreate the source and target databases with the schema in the Excel you downloaded earlier. For more information on the database designer, see this [documentation](../synapse-analytics/database-designer/concepts-database-templates.md).
47
47
48
48
With the databases created, ensure the dataflows are pointing to the correct tables by editing the dropdowns in the Workspace DB source and sink settings. You can load the data into the source model by placing the CSV files provided in the example dataset in the correct folders specified by the tables. Once that is done, all that's required is to run the pipelines.
49
49
50
50
## Troubleshoot the pipelines
51
51
If the pipeline fails to run successfully, there's a few main things to check for errors.
52
52
53
53
* Dataset schema. Make sure the data settings for the CSV files are accurate. If you included row headers, make sure the how headers option is checked on the database table.
54
-
* Data flow sources. If you used different column or table names than what were provided in the example schema, you will need to step through the data flows to verify that the columns are mapped correctly.
54
+
* Data flow sources. If you used different column or table names than what were provided in the example schema, you'll need to step through the data flows to verify that the columns are mapped correctly.
55
55
* Data flow sink. The schema and data format configurations on the target database will need to match the data flow template. Like above, if any changes were made you those items will need to be aligned.
Many times, when processing data for ETL jobs, you will need to change the column names before writing the results. Sometimes this is needed to align column names to a well-known target schema. Other times, you may need to set column names at runtime based on evolving schemas. In this tutorial, you'll learn how to use data flows to set column names for your destination files and database tables dynamically using external configuration files and parameters.
17
+
Many times, when processing data for ETL jobs, you'll need to change the column names before writing the results. Sometimes this is needed to align column names to a well-known target schema. Other times, you may need to set column names at runtime based on evolving schemas. In this tutorial, you'll learn how to use data flows to set column names for your destination files and database tables dynamically using external configuration files and parameters.
18
18
19
19
If you're new to Azure Data Factory, see [Introduction to Azure Data Factory](introduction.md).
20
20
@@ -51,7 +51,7 @@ In this step, you'll create a pipeline that contains a data flow activity.
51
51
1. In the **Activities** pane, expand the **Move and Transform** accordion. Drag and drop the **Data Flow** activity from the pane to the pipeline canvas.
52
52
53
53
:::image type="content" source="media/tutorial-data-flow/activity1.png" alt-text="Screenshot that shows the pipeline canvas where you can drop the Data Flow activity.":::
54
-
1. In the **Adding Data Flow** pop-up, select **Create new Data Flow** and then name your data flow **DynaCols**. Click Finish when done.
54
+
1. In the **Adding Data Flow** pop-up, select **Create new Data Flow** and then name your data flow **DynaCols**. Select Finish when done.
55
55
56
56
## Build dynamic column mapping in data flows
57
57
@@ -71,10 +71,10 @@ You'll learn how to dynamically set column names using a data flow
71
71
72
72
First, let's set up the data flow environment for each of the mechanisms described below for landing data in ADLS Gen2.
73
73
74
-
1.Click on the source transformation and call it ```movies1```.
75
-
1.Click the new button next to dataset in the bottom panel.
74
+
1.Select on the source transformation and call it ```movies1```.
75
+
1.Select the new button next to dataset in the bottom panel.
76
76
1. Choose either Blob or ADLS Gen2 depending on where you stored the moviesDB.csv file from above.
77
-
1. Add a 2nd source, which we will use to source the configuration JSON file to lookup field mappings.
77
+
1. Add a second source, which we'll use to source the configuration JSON file to look up field mappings.
78
78
1. Call this as ```columnmappings```.
79
79
1. For the dataset, point to a new JSON file that will store a configuration for column mapping. You can paste the into the JSON file for this tutorial example:
80
80
```
@@ -84,29 +84,29 @@ First, let's set up the data flow environment for each of the mechanisms describ
84
84
]
85
85
```
86
86
87
-
1. Set this source settings to ```array of documents```.
88
-
1. Add a 3rd source and call it ```movies2```. Configure this exactly the same as ```movies1```.
87
+
1. Set this source setting to ```array of documents```.
88
+
1. Add a third source and call it ```movies2```. Configure this exactly the same as ```movies1```.
89
89
90
90
### Parameterized column mapping
91
91
92
-
In this first scenario, you will set output column names in you data flow by setting the column mapping based on matching incoming fields with a parameter that is a string array of columns and match each array index with the incoming column ordinal position. When executing this data flow from a pipeline, you will be able to set different column names on each pipeline execution by sending in this string array parameter to the data flow activity.
92
+
In this first scenario, you'll set output column names in your data flow by setting the column mapping based on matching incoming fields with a parameter that is a string array of columns and match each array index with the incoming column ordinal position. When executing this data flow from a pipeline, you'll be able to set different column names on each pipeline execution by sending in this string array parameter to the data flow activity.
1. Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name values ```(a,b,c)``` replace the original movie, title, genres column names
109
+
1. Select on the Inspect and Data Preview tabs of the Select transformation to view the new column name values ```(a,b,c)``` replace the original movie, title, genres column names
110
110
111
111
### Create a cached lookup of external column mappings
112
112
@@ -116,16 +116,16 @@ Next, we'll create a cached sink for a later lookup. The cache will read an exte
116
116
1. Set sink type to ```Cache```.
117
117
1. Under Settings, choose ```prevcolumn``` as the key column.
118
118
119
-
### Lookup columns names from cached sink
119
+
### Look up columns names from cached sink
120
120
121
121
Now that you've stored the configuration file contents in memory, you can dynamically map incoming column names to new outgoing column names.
122
122
123
-
1. Go back to the data flow designer and edit the data flow create above. Click on the ```movies2``` source transformation.
123
+
1. Go back to the data flow designer and edit the data flow create above. Select on the ```movies2``` source transformation.
124
124
1. Add a Select transformation. This time, we'll use the Select transformation to rename column names based on the target name in the JSON configuration file that is being stored in the cached sink.
125
125
1. Add a rule-based mapping. For the Matching Condition, use this formula: ```!isNull(cachedSink#lookup(name).prevcolumn)```.
126
126
1. For the output column name, use this formula: ```cachedSink#lookup($$).newcolumn```.
127
127
1. What we've done is to find all column names that match the ```prevcolumn``` property from the external JSON configuration file and renamed each match to the new ```newcolumn``` name.
128
-
1. Click on the Data Preview and Inspect tabs in the Select transformation and you should now see the new column names from the external mapping file.
128
+
1. Select on the Data Preview and Inspect tabs in the Select transformation and you should now see the new column names from the external mapping file.
0 commit comments