Skip to content

Commit 2076dcb

Browse files
authored
Merge pull request #146671 from kromerm/adfdocsmark
added new tutorial and transform
2 parents bc0af53 + c41db1c commit 2076dcb

12 files changed

+245
-0
lines changed

articles/data-factory/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@
105105
items:
106106
- name: Transform data with mapping data flows
107107
items:
108+
- name: Transform data in the lake with Delta Lake
109+
href: tutorial-data-flow-delta-lake.md
108110
- name: Transform data with mapping data flows
109111
href: tutorial-data-flow.md
110112
- name: Mapping data flow video tutorials
@@ -556,6 +558,8 @@
556558
href: data-flow-lookup.md
557559
- name: New branch
558560
href: data-flow-new-branch.md
561+
- name: Parse
562+
href: data-flow-parse.md
559563
- name: Pivot
560564
href: data-flow-pivot.md
561565
- name: Rank
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Parse data transformation in mapping data flow
3+
description: Parse embedded column documents
4+
author: kromerm
5+
ms.author: makromer
6+
ms.service: data-factory
7+
ms.topic: conceptual
8+
ms.date: 02/08/2021
9+
---
10+
11+
# Parse transformation in mapping data flow
12+
13+
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
14+
15+
Use the Parse transformation to parse columns in your data that are in document form. The current supported types of embedded documents that can be parsed are JSON and delimited text.
16+
17+
## Configuration
18+
19+
In the parse transformation configuration panel, you will first pick the type of data contained in the columns that you wish to parse inline. The parse transformation also contains the following configuration settings.
20+
21+
![Parse settings](media/data-flow/data-flow-parse-1.png "Parse")
22+
23+
### Column
24+
25+
Similar to derived columns and aggregates, this is where you will either modify an exiting column by selecting it from the drop-down picker. Or you can type in the name of a new column here. ADF will store the parsed source data in this column.
26+
27+
### Expression
28+
29+
Use the expression builder to set the source for your parsing. This can be as simple as just selecting the source column with the self-contained data that you wish to parse, or you can create complex expressions to parse.
30+
31+
### Output column type
32+
33+
Here is where you will configure the target output schema from the parsing that will be written into a single column.
34+
35+
![Parse example](media/data-flow/data-flow-parse-2.png "Parse example")
36+
37+
In this example, we have defined parsing of the incoming field "jsonString" which is plain text, but formatted as a JSON structure. We're going to store the parsed results as JSON in a new column called "json" with this schema:
38+
39+
```(trade as boolean, customers as string[])```
40+
41+
Refer to the inspect tab and data preview to verify your output is mapped properly.
42+
43+
## Examples
44+
45+
```
46+
source(output(
47+
name as string,
48+
location as string,
49+
satellites as string[],
50+
goods as (trade as boolean, customers as string[], orders as (orderId as string, orderTotal as double, shipped as (orderItems as (itemName as string, itemQty as string)[]))[])
51+
),
52+
allowSchemaDrift: true,
53+
validateSchema: false,
54+
ignoreNoFilesFound: false,
55+
documentForm: 'documentPerLine') ~> JsonSource
56+
source(output(
57+
movieId as string,
58+
title as string,
59+
genres as string
60+
),
61+
allowSchemaDrift: true,
62+
validateSchema: false,
63+
ignoreNoFilesFound: false) ~> CsvSource
64+
JsonSource derive(jsonString = toString(goods)) ~> StringifyJson
65+
StringifyJson parse(json = jsonString ? (trade as boolean,
66+
customers as string[]),
67+
format: 'json',
68+
documentForm: 'arrayOfDocuments') ~> ParseJson
69+
CsvSource derive(csvString = 'Id|name|year\n\'1\'|\'test1\'|\'1999\'') ~> CsvString
70+
CsvString parse(csv = csvString ? (id as integer,
71+
name as string,
72+
year as string),
73+
format: 'delimited',
74+
columnNamesAsHeader: true,
75+
columnDelimiter: '|',
76+
nullValue: '',
77+
documentForm: 'documentPerLine') ~> ParseCsv
78+
ParseJson select(mapColumn(
79+
jsonString,
80+
json
81+
),
82+
skipDuplicateMapInputs: true,
83+
skipDuplicateMapOutputs: true) ~> KeepStringAndParsedJson
84+
ParseCsv select(mapColumn(
85+
csvString,
86+
csv
87+
),
88+
skipDuplicateMapInputs: true,
89+
skipDuplicateMapOutputs: true) ~> KeepStringAndParsedCsv
90+
```
91+
92+
## Data flow script
93+
94+
### Syntax
95+
96+
### Examples
97+
98+
```
99+
parse(json = jsonString ? (trade as boolean,
100+
customers as string[]),
101+
format: 'json',
102+
documentForm: 'singleDocument') ~> ParseJson
103+
104+
parse(csv = csvString ? (id as integer,
105+
name as string,
106+
year as string),
107+
format: 'delimited',
108+
columnNamesAsHeader: true,
109+
columnDelimiter: '|',
110+
nullValue: '',
111+
documentForm: 'documentPerLine') ~> ParseCsv
112+
```
113+
114+
## Next steps
115+
116+
* Use the [Flatten transformation](data-flow-flatten.md) to pivot rows to columns.
117+
* Use the [Derived column transformation](data-flow-derived-column.md) to pivot columns to rows.
5.8 KB
Loading
14.1 KB
Loading
57.3 KB
Loading
14.9 KB
Loading
11.1 KB
Loading
3.53 KB
Loading
13.1 KB
Loading
19 KB
Loading

0 commit comments

Comments
 (0)