Skip to content

Commit cb97fc2

Browse files
Update apache-spark-performance.md
1 parent 2e28ef8 commit cb97fc2

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

articles/synapse-analytics/spark/apache-spark-performance.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,10 @@ Spark provides its own native caching mechanisms, which can be used through diff
5757
Spark operates by placing data in memory, so managing memory resources is a key aspect of optimizing the execution of Spark jobs. There are several techniques you can apply to use your cluster's memory efficiently.
5858

5959
* Prefer smaller data partitions and account for data size, types, and distribution in your partitioning strategy.
60-
* Consider the newer, more efficient [Kryo data serialization](https://github.com/EsotericSoftware/kryo), rather than the default Java serialization.
60+
* In Synapse Spark (Runtime 3.1 or higher) you get Kryo data serialization enabled by default [Kryo data serialization](https://github.com/EsotericSoftware/kryo).
61+
* You can customize the kryoserializer buffer size based on your requirements using
62+
`// Set the desired property`
63+
`spark.conf.set("spark.kryoserializer.buffer.max", "256m") `
6164
* Monitor and tune Spark configuration settings.
6265

6366
For your reference, the Spark memory structure and some key executor memory parameters are shown in the next image.

0 commit comments

Comments
 (0)