Skip to content

Commit 6e24f7a

Browse files
committed
Acrolinx improvements
1 parent e2deec0 commit 6e24f7a

File tree

2 files changed

+19
-19
lines changed

2 files changed

+19
-19
lines changed

articles/data-factory/tutorial-data-flow-adventure-works-retail-template.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 09/26/2022
1414
This document explains how to setup and use Microsoft's AdventureWorks pipeline template to jump start the exploration of the AdventureWorks dataset using Azure Synapse Analytics and the Retail database template.
1515

1616
## Overview
17-
AdventureWorks is a fictional sports equipment retailer that is used to demo Microsoft applications. In this case, they are being used as an example for how to use Synapse Pipelines to map retail data to the Retail database template for further analysis within Azure Synapse.
17+
AdventureWorks is a fictional sports equipment retailer that is used to demo Microsoft applications. In this case, they're being used as an example for how to use Synapse Pipelines to map retail data to the Retail database template for further analysis within Azure Synapse.
1818

1919
## Prerequisites
2020

@@ -34,7 +34,7 @@ Follow these steps to locate the template.
3434
These steps open the template overview page.
3535

3636
## Configure the template
37-
The template is designed to require minimal configuration. From the template overview page you can see a preview of the initial starting configuration of the pipeline, and click **Open pipeline** to create the resources in your own workspace. You will get a notification that all 31 resources in the template have been created, and can review these before committing or publishing them. You will find the below components of the template:
37+
The template is designed to require minimal configuration. From the template overview page you can see a preview of the initial starting configuration of the pipeline, and select **Open pipeline** to create the resources in your own workspace. You'll get a notification that all 31 resources in the template have been created, and can review these before committing or publishing them. You'll find the below components of the template:
3838

3939
* 17 pipelines: These are scheduled to ensure the data loads into the target tables correctly, and include one pipeline per source table plus the scheduling ones.
4040
* 14 data flows: These contain the logic to load from the source system and land the data into the target database.
@@ -43,15 +43,15 @@ If you have the AdventureWorks dataset loaded into a different database, you can
4343

4444

4545
## Dataset and source/target models
46-
The AdventureWorks dataset in Excel format can be downloaded from this [GitHub site](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorks%20Data.zip). In addition, you can access the [schema definition for both the source and target databases](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorksSchemas.xlsx). Using the database designer in Synapse, recreate the source and target databases with the schema in the Excel you downloaded earlier. For more details on the database designer, see this [documentation](../synapse-analytics/database-designer/concepts-database-templates.md).
46+
The AdventureWorks dataset in Excel format can be downloaded from this [GitHub site](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorks%20Data.zip). In addition, you can access the [schema definition for both the source and target databases](https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/AdventureWorksSchemas.xlsx). Using the database designer in Synapse, recreate the source and target databases with the schema in the Excel you downloaded earlier. For more information on the database designer, see this [documentation](../synapse-analytics/database-designer/concepts-database-templates.md).
4747

4848
With the databases created, ensure the dataflows are pointing to the correct tables by editing the dropdowns in the Workspace DB source and sink settings. You can load the data into the source model by placing the CSV files provided in the example dataset in the correct folders specified by the tables. Once that is done, all that's required is to run the pipelines.
4949

5050
## Troubleshoot the pipelines
5151
If the pipeline fails to run successfully, there's a few main things to check for errors.
5252

5353
* Dataset schema. Make sure the data settings for the CSV files are accurate. If you included row headers, make sure the how headers option is checked on the database table.
54-
* Data flow sources. If you used different column or table names than what were provided in the example schema, you will need to step through the data flows to verify that the columns are mapped correctly.
54+
* Data flow sources. If you used different column or table names than what were provided in the example schema, you'll need to step through the data flows to verify that the columns are mapped correctly.
5555
* Data flow sink. The schema and data format configurations on the target database will need to match the data flow template. Like above, if any changes were made you those items will need to be aligned.
5656

5757
## Next steps

articles/data-factory/tutorial-data-flow-dynamic-columns.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 09/26/2022
1414

1515
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
1616

17-
Many times, when processing data for ETL jobs, you will need to change the column names before writing the results. Sometimes this is needed to align column names to a well-known target schema. Other times, you may need to set column names at runtime based on evolving schemas. In this tutorial, you'll learn how to use data flows to set column names for your destination files and database tables dynamically using external configuration files and parameters.
17+
Many times, when processing data for ETL jobs, you'll need to change the column names before writing the results. Sometimes this is needed to align column names to a well-known target schema. Other times, you may need to set column names at runtime based on evolving schemas. In this tutorial, you'll learn how to use data flows to set column names for your destination files and database tables dynamically using external configuration files and parameters.
1818

1919
If you're new to Azure Data Factory, see [Introduction to Azure Data Factory](introduction.md).
2020

@@ -51,7 +51,7 @@ In this step, you'll create a pipeline that contains a data flow activity.
5151
1. In the **Activities** pane, expand the **Move and Transform** accordion. Drag and drop the **Data Flow** activity from the pane to the pipeline canvas.
5252

5353
:::image type="content" source="media/tutorial-data-flow/activity1.png" alt-text="Screenshot that shows the pipeline canvas where you can drop the Data Flow activity.":::
54-
1. In the **Adding Data Flow** pop-up, select **Create new Data Flow** and then name your data flow **DynaCols**. Click Finish when done.
54+
1. In the **Adding Data Flow** pop-up, select **Create new Data Flow** and then name your data flow **DynaCols**. Select Finish when done.
5555

5656
## Build dynamic column mapping in data flows
5757

@@ -71,10 +71,10 @@ You'll learn how to dynamically set column names using a data flow
7171

7272
First, let's set up the data flow environment for each of the mechanisms described below for landing data in ADLS Gen2.
7373

74-
1. Click on the source transformation and call it ```movies1```.
75-
1. Click the new button next to dataset in the bottom panel.
74+
1. Select on the source transformation and call it ```movies1```.
75+
1. Select the new button next to dataset in the bottom panel.
7676
1. Choose either Blob or ADLS Gen2 depending on where you stored the moviesDB.csv file from above.
77-
1. Add a 2nd source, which we will use to source the configuration JSON file to lookup field mappings.
77+
1. Add a second source, which we'll use to source the configuration JSON file to look up field mappings.
7878
1. Call this as ```columnmappings```.
7979
1. For the dataset, point to a new JSON file that will store a configuration for column mapping. You can paste the into the JSON file for this tutorial example:
8080
```
@@ -84,29 +84,29 @@ First, let's set up the data flow environment for each of the mechanisms describ
8484
]
8585
```
8686
87-
1. Set this source settings to ```array of documents```.
88-
1. Add a 3rd source and call it ```movies2```. Configure this exactly the same as ```movies1```.
87+
1. Set this source setting to ```array of documents```.
88+
1. Add a third source and call it ```movies2```. Configure this exactly the same as ```movies1```.
8989
9090
### Parameterized column mapping
9191
92-
In this first scenario, you will set output column names in you data flow by setting the column mapping based on matching incoming fields with a parameter that is a string array of columns and match each array index with the incoming column ordinal position. When executing this data flow from a pipeline, you will be able to set different column names on each pipeline execution by sending in this string array parameter to the data flow activity.
92+
In this first scenario, you'll set output column names in your data flow by setting the column mapping based on matching incoming fields with a parameter that is a string array of columns and match each array index with the incoming column ordinal position. When executing this data flow from a pipeline, you'll be able to set different column names on each pipeline execution by sending in this string array parameter to the data flow activity.
9393
9494
:::image type="content" source="media/data-flow/dynacols-3.png" alt-text="Parameters":::
9595
9696
1. Go back to the data flow designer and edit the data flow created above.
97-
1. Click on the parameters tab
97+
1. Select on the parameters tab
9898
1. Create a new parameter and choose string array data type
9999
1. For the default value, enter ```['a','b','c']```
100100
1. Use the top ```movies1``` source to modify the column names to map to these array values
101101
1. Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
102-
1. We're going to change the first 3 column names to the new names defined in the parameter
103-
1. To do this, add 3 rule-based mapping entries in the bottom pane
102+
1. We're going to change the first three column names to the new names defined in the parameter
103+
1. To do this, add three rule-based mapping entries in the bottom pane
104104
1. For the first column, the matching rule will be ```position==1``` and the name will be ```$parameter1[1]```
105105
1. Follow the same pattern for column 2 and 3
106106
107107
:::image type="content" source="media/data-flow/dynacols-4.png" alt-text="Select transformation":::
108108
109-
1. Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name values ```(a,b,c)``` replace the original movie, title, genres column names
109+
1. Select on the Inspect and Data Preview tabs of the Select transformation to view the new column name values ```(a,b,c)``` replace the original movie, title, genres column names
110110
111111
### Create a cached lookup of external column mappings
112112
@@ -116,16 +116,16 @@ Next, we'll create a cached sink for a later lookup. The cache will read an exte
116116
1. Set sink type to ```Cache```.
117117
1. Under Settings, choose ```prevcolumn``` as the key column.
118118
119-
### Lookup columns names from cached sink
119+
### Look up columns names from cached sink
120120
121121
Now that you've stored the configuration file contents in memory, you can dynamically map incoming column names to new outgoing column names.
122122
123-
1. Go back to the data flow designer and edit the data flow create above. Click on the ```movies2``` source transformation.
123+
1. Go back to the data flow designer and edit the data flow create above. Select on the ```movies2``` source transformation.
124124
1. Add a Select transformation. This time, we'll use the Select transformation to rename column names based on the target name in the JSON configuration file that is being stored in the cached sink.
125125
1. Add a rule-based mapping. For the Matching Condition, use this formula: ```!isNull(cachedSink#lookup(name).prevcolumn)```.
126126
1. For the output column name, use this formula: ```cachedSink#lookup($$).newcolumn```.
127127
1. What we've done is to find all column names that match the ```prevcolumn``` property from the external JSON configuration file and renamed each match to the new ```newcolumn``` name.
128-
1. Click on the Data Preview and Inspect tabs in the Select transformation and you should now see the new column names from the external mapping file.
128+
1. Select on the Data Preview and Inspect tabs in the Select transformation and you should now see the new column names from the external mapping file.
129129
130130
:::image type="content" source="media/data-flow/dynacols-2.png" alt-text="Source 2":::
131131

0 commit comments

Comments
 (0)