Merge pull request #97439 from kromerm/dataflow-1

PRMerger8 · web-flow · commit dd482bc9cf2e · 2019-12-01T16:10:17.000-08:00
Dataflow 1
diff --git a/articles/data-factory/concepts-data-flow-expression-builder.md b/articles/data-factory/concepts-data-flow-expression-builder.md
@@ -99,7 +99,13 @@ If you put a comment at the top of your expression, it will appear in the transf
 
 ```toString(toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss'), 'MM/dd /yyyy\'T\'HH:mm:ss')```
 
-Note that to include string literals in your timestamp output, you need to wrap your conversion inside of a toString()
+Note that to include string literals in your timestamp output, you need to wrap your conversion inside of ```toString()```.
+
+Here is how to convert seconds from Epoch to a date or timestamp:
+
+```toTimestamp(1574127407*1000l)```
+
+Notice the trailing "l" at the end of the expression above. That signifies conversion to long as in-line syntax.
 
 ## Handling column names with special characters
 
diff --git a/articles/data-factory/concepts-data-flow-performance.md b/articles/data-factory/concepts-data-flow-performance.md
@@ -116,6 +116,14 @@ For example, if you have a list of data files from July 2019 that you wish to pr
 
 By using wildcarding, your pipeline will only contain one Data Flow activity. This will perform better than a Lookup against the Blob Store that then iterates across all matched files using a ForEach with an Execute Data Flow activity inside.
 
+### Optimizing for CosmosDB
+
+Setting throughput and batch properties on CosmosDB sinks only take effect during the execution of that data flow from a pipeline data flow activity. The original collection settings will be honored by CosmosDB after your data flow execution.
+
+* Batch size: Calculate the rough row size of your data, and make sure that rowSize * batch size is less than two million. If it is, increase the batch size to get better throughput
+* Througput: Set a higher throughput setting here to allow documents to write faster to CosmosDB. Please keep in mind the higher RU costs based upon a high throughput setting.
+*	Write Throughput Budget: Use a value which is smaller than total RUs per minute. If you have a data flow with a high number of Spark partitiongs, setting a a budget throughput will allow more balance across those partitions.
+
 ## Next steps
 
 See other Data Flow articles related to performance: