Skip to content

Commit aa91c36

Browse files
authored
Update data-flow-script.md
1 parent 742e664 commit aa91c36

File tree

1 file changed

+35
-1
lines changed

1 file changed

+35
-1
lines changed

articles/data-factory/data-flow-script.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: nimoolen
66
ms.service: data-factory
77
ms.topic: conceptual
88
ms.custom: seo-lt-2019
9-
ms.date: 11/10/2019
9+
ms.date: 03/24/2020
1010
---
1111

1212
# Data flow script (DFS)
@@ -133,6 +133,40 @@ derive1 sink(allowSchemaDrift: true,
133133
validateSchema: false) ~> sink1
134134
```
135135

136+
## Script snippets
137+
138+
### Aggregated summary stats
139+
Add an Aggregate transformation to your data flow called "SummaryStats" and then paste in this code below for the aggregate function in your script, replacing the existing SummaryStats. This will provide a generic pattern for data profile summary statistics.
140+
141+
```
142+
aggregate(each(match(true()), $$+'_NotNull' = countIf(!isNull($$)), $$ + '_Null' = countIf(isNull($$))),
143+
each(match(type=='double'||type=='integer'||type=='short'||type=='decimal'), $$+'_stddev' = round(stddev($$),2), $$ + '_min' = min ($$), $$ + '_max' = max($$), $$ + '_average' = round(avg($$),2), $$ + '_variance' = round(variance($$),2)),
144+
each(match(type=='string'), $$+'_maxLength' = max(length($$)))) ~> SummaryStats
145+
```
146+
You can also use the below sample to count the number of unique and the number of distinct rows in your data. The example below can be pasted into a data flow with Aggregate transformation called ValueDistAgg. This example uses a column called "title". Be sure to replace "title" with the string column in your data that you wish to use to get value counts.
147+
148+
```
149+
aggregate(groupBy(title),
150+
countunique = count()) ~> ValueDistAgg
151+
ValueDistAgg aggregate(numofunique = countIf(countunique==1),
152+
numofdistinct = countDistinct(title)) ~> UniqDist
153+
```
154+
155+
### Include all columns in an aggregate
156+
This is a generic aggregate pattern that demonstrates how you can keep the remaining columns in your output metadata when you are building aggregates. In this case, we use the ```first()``` function to choose the first value in every column whose name is not "movie". To use this, create an Aggregate transformation called DistinctRows and then paste this in your script over top of the existing DistinctRows aggregate script.
157+
158+
```
159+
aggregate(groupBy(movie),
160+
each(match(name!='movie'), $$ = first($$))) ~> DistinctRows
161+
```
162+
163+
### Create row hash fingerprint
164+
Use this code in your data flow script to create a new derived column called ```DWhash``` that produces a ```sha1``` hash of three columns.
165+
166+
```
167+
derive(DWhash = sha1(Name,ProductNumber,Color))
168+
```
169+
136170
## Next steps
137171

138172
Explore Data Flows by starting with the [data flows overview article](concepts-data-flow-overview.md)

0 commit comments

Comments
 (0)