MicrosoftDocs
diff --git a/‎articles/data-factory/data-flow-aggregate.md
Lines changed: 12 additions & 2 deletions b/‎articles/data-factory/data-flow-aggregate.md
Lines changed: 12 additions & 2 deletions
diff --git a/‎articles/data-factory/data-flow-lookup.md
Lines changed: 1 addition & 1 deletion b/‎articles/data-factory/data-flow-lookup.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/data-factory/media/data-flow/agg-dedupe.png
89.9 KB b/‎articles/data-factory/media/data-flow/agg-dedupe.png
89.9 KB
diff --git a/‎articles/data-factory/media/data-flow/lookup-dsl-example.png
110 KB b/‎articles/data-factory/media/data-flow/lookup-dsl-example.png
110 KB
@@ -7,7 +7,7 @@ ms.reviewer: daperlov
 ms.service: data-factory
 ms.topic: conceptual
 ms.custom: seo-lt-2019
-ms.date: 10/15/2019
+ms.date: 03/24/2020
 ---
 
 # Aggregate transformation in mapping data flow 
@@ -41,6 +41,16 @@ Aggregate transformations are similar to SQL aggregate select queries. Columns t
 * Use an aggregate function such as `last()` or `first()` to include that additional column.
 * Rejoin the columns to your output stream using the [self join pattern](https://mssqldude.wordpress.com/2018/12/20/adf-data-flows-self-join/).
 
+## Removing duplicate rows
+
+A common use of the aggregate transformation is removing or identifying duplicate entries in source data. This process is known as deduplication. Based upon a set of group by keys, use a heuristic of your choosing to determine which duplicate row to keep. Common heuristics are `first()`, `last()`, `max()`, and `min()`. Use [column patterns](concepts-data-flow-column-pattern.md) to apply the rule to every column except for the group by columns.
+
+![Deduplication](media/data-flow/agg-dedupe.png "Deduplication")
+
+In the above example, columns `ProductID` and `Name` are being use for grouping. If two rows have the same values for those two columns, they're considered duplicates. In this aggregate transformation, the values of the first row matched will be kept and all others will be dropped. Using column pattern syntax, all columns whose names aren't `ProductID` and `Name` are mapped to their existing column name and given the value of the first matched rows. The output schema is the same as the input schema.
+
+For data validation scenarios, the `count()` function can be used to count how many duplicates there are.
+
 ## Data flow script
 
 ### Syntax
@@ -77,7 +87,7 @@ The data flow script for this transformation is in the snippet below.
 ```
 MoviesYear aggregate(
                 groupBy(year),
-	            avgrating = avg(toInteger(Rating))
+                avgrating = avg(toInteger(Rating))
             ) ~> AvgComedyRatingByYear
 ```
 
 
@@ -70,7 +70,7 @@ Enabling broadcasting pushes the entire dataset into memory. For smaller dataset
 ```
 ### Example
 
-![Lookup Transformation](media/data-flow/lookup1.png "Lookup")
+![Lookup Transformation](media/data-flow/lookup-dsl-example.png "Lookup")
 
 The data flow script for the above lookup configuration is in the code snippet below.