Update concepts-data-flow-performance.md

kromerm · web-flow · commit 1bb067b7581a · 2020-01-24T13:21:47.000-08:00
diff --git a/articles/data-factory/concepts-data-flow-performance.md b/articles/data-factory/concepts-data-flow-performance.md
@@ -6,7 +6,7 @@ ms.topic: conceptual
 ms.author: makromer
 ms.service: data-factory
 ms.custom: seo-lt-2019
-ms.date: 12/19/2019
+ms.date: 01/24/2020
 ---
 
 # Mapping data flows performance and tuning guide
@@ -125,6 +125,12 @@ Setting throughput and batch properties on CosmosDB sinks only take effect durin
 * Throughput: Set a higher throughput setting here to allow documents to write faster to CosmosDB. Please keep in mind the higher RU costs based upon a high throughput setting.
 *	Write Throughput Budget: Use a value which is smaller than total RUs per minute. If you have a data flow with a high number of Spark partitions, setting a a budget throughput will allow more balance across those partitions.
 
+## Join performance
+
+Managing the performance of joins in your data flow is a very common operation that you will perform throughout the lifecycle of your data transformations. In ADF, data flows do not require data to be sorted prior to joins as these operations are performed as hash joins in Spark. However, you can benefit from improved performance with the "Broadcast" Join optimization. This will avoid shuffles by pushing down the conents of either side of your join relationship into the Spark node. This works well for smaller tables that are used for reference lookups. Larger tables that my not fit into the node's memory are not good candidates for broadcast optimization.
+
+Another Join optimization is to build your joins in such a way that it avoids Spark's tendency to implement cross joins. For example, when you include literal values in your join conditions, Spark may see that as a requirement to perform a full cartesian product first, then filter out the joined values. But if you ensure that you have column values on both sides of your join condition, you can avoid this Spark-induced cartesian product and improve the performance of your joins and data flows.
+
 ## Next steps
 
 See other Data Flow articles related to performance: