Skip to content

Commit b07f43e

Browse files
committed
Freshness, acrolinx, and formatting
1 parent d0040a0 commit b07f43e

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/synapse-analytics/spark/apache-spark-pool-configurations.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Apache Spark pool concepts
33
description: Introduction to Apache Spark pool sizes and configurations in Azure Synapse Analytics.
4-
ms.topic: conceptual
4+
ms.topic: concept-article
55
ms.service: azure-synapse-analytics
66
ms.subservice: spark
77
ms.custom: references_regions
88
author: guyhay
99
ms.author: guyhay
1010
ms.reviewer: whhender
11-
ms.date: 09/07/2022
11+
ms.date: 12/06/2024
1212
---
1313

1414
# Apache Spark pool configurations in Azure Synapse Analytics
@@ -53,7 +53,7 @@ Autoscale for Apache Spark pools allows automatic scale up and down of compute r
5353
Apache Spark pools now support elastic pool storage. Elastic pool storage allows the Spark engine to monitor worker node temporary storage and attach extra disks if needed. Apache Spark pools utilize temporary disk storage while the pool is instantiated. Spark jobs write shuffle map outputs, shuffle data and spilled data to local VM disks. Examples of operations that could utilize local disk are sort, cache, and persist. When temporary VM disk space runs out, Spark jobs could fail due to “Out of Disk Space” error (java.io.IOException: No space left on device). With “Out of Disk Space” errors, much of the burden to prevent jobs from failing shifts to the customer to reconfigure the Spark jobs (for example, tweak the number of partitions) or clusters (for example, add more nodes to the cluster). These errors might not be consistent, and the user might end up experimenting heavily by running production jobs. This process can be expensive for the user in multiple dimensions:
5454

5555
* Wasted time. Customers are required to experiment heavily with job configurations via trial and error and are expected to understand Spark’s internal metrics to make the correct decision.
56-
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail non-deterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
56+
* Wasted resources. Since production jobs can process varying amount of data, Spark jobs can fail nondeterministically if resources aren't over-provisioned. For instance, consider the problem of data skew, which could result in a few nodes requiring more disk space than others. Currently in Synapse, each node in a cluster gets the same size of disk space and increasing disk space across all nodes isn't an ideal solution and leads to tremendous waste.
5757
* Slowdown in job execution. In the hypothetical scenario where we solve the problem by autoscaling nodes (assuming costs aren't an issue to the end customer), adding a compute node is still expensive (takes a few minutes) as opposed to adding storage (takes a few seconds).
5858

5959
No action is required by you, plus you should see fewer job failures as a result.
@@ -65,7 +65,7 @@ No action is required by you, plus you should see fewer job failures as a result
6565

6666
The automatic pause feature releases resources after a set idle period, reducing the overall cost of an Apache Spark pool. The number of minutes of idle time can be set once this feature is enabled. The automatic pause feature is independent of the autoscale feature. Resources can be paused whether the autoscale is enabled or disabled. This setting can be altered after pool creation although active sessions will need to be restarted.
6767

68-
## Next steps
68+
## Related content
6969

7070
* [Azure Synapse Analytics](../index.yml)
7171
* [Apache Spark Documentation](https://spark.apache.org/docs/3.2.1/)

0 commit comments

Comments
 (0)