Skip to content

Commit 2de497d

Browse files
Merge pull request #265264 from AjayBathini-MSFT/patch-142
(AzureCXP) fixes MicrosoftDocs/azure-docs#119494
2 parents e65cc8a + b565ddc commit 2de497d

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/data-factory/concepts-integration-runtime-performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,11 @@ If your data flow has many joins and lookups, you may want to use a **memory opt
2929

3030
## Cluster size
3131

32-
Data flows distribute the data processing over different nodes in a Spark cluster to perform operations in parallel. A Spark cluster with more cores increases the number of nodes in the compute environment. More nodes increase the processing power of the data flow. Increasing the size of the cluster is often an easy way to reduce the processing time.
32+
Data flows distribute the data processing over different cores in a Spark cluster to perform operations in parallel. A Spark cluster with more cores increases the number of cores in the compute environment. More cores increase the processing power of the data flow. Increasing the size of the cluster is often an easy way to reduce the processing time.
3333

34-
The default cluster size is four driver nodes and four worker nodes (small). As you process more data, larger clusters are recommended. Below are the possible sizing options:
34+
The default cluster size is four driver cores and four worker cores (small). As you process more data, larger clusters are recommended. Below are the possible sizing options:
3535

36-
| Worker Nodes | Driver Nodes | Total Nodes | Notes |
36+
| Worker Cores | Driver Cores | Total Cores | Notes |
3737
| ------------ | ------------ | ----------- | ----- |
3838
| 4 | 4 | 8 | Small |
3939
| 8 | 8 | 16 | Medium |
@@ -46,7 +46,7 @@ The default cluster size is four driver nodes and four worker nodes (small). As
4646
Data flows are priced at vcore-hrs meaning that both cluster size and execution-time factor into this. As you scale up, your cluster cost per minute will increase, but your overall time will decrease.
4747

4848
> [!TIP]
49-
> There is a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there is a point where increasing the size of a cluster will stop improving performance. For example, If you have more nodes than partitions of data, adding additional nodes won't help.
49+
> There is a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there is a point where increasing the size of a cluster will stop improving performance. For example, If you have more cores than partitions of data, adding additional cores won't help.
5050
A best practice is to start small and scale up to meet your performance needs.
5151

5252
## Custom shuffle partition

0 commit comments

Comments
 (0)