Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Jan 7, 2026

What changes were proposed in this pull request?

Make UnsafeInMemorySorter free memory thread-safety

Why are the changes needed?

I encountered a SparkOutOfMemoryError. Logs indicate that memory is held by UnsafeExternalSorter, but UnsafeExternalSorter is supposed to trigger spilling and release memory. UnsafeInMemorySorter.freeMemory is not thread-safe, while UnsafeExternalSorter.spill may be called concurrently by multiple threads, which could lead to a memory leak. Related logs:

26/01/05 23:04:53 INFO UnsafeExternalSorter: Thread 152 spilling sort data of 2.1 GiB to disk (0  time so far)
...
26/01/05 23:05:10 INFO UnsafeExternalSorter: Thread 152 spilling sort data of 303.0 KiB to disk (0  time so far)
26/01/05 23:05:10 INFO TaskMemoryManager: Memory used in task 172030
26/01/05 23:05:10 INFO TaskMemoryManager: Acquired by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@72075707: 303.0 KiB
26/01/05 23:05:10 INFO TaskMemoryManager: Acquired by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@3e02e759: 2.1 GiB
26/01/05 23:05:10 INFO TaskMemoryManager: 0 bytes of memory were used by task 172030 but are not associated with specific consumers
26/01/05 23:05:10 INFO TaskMemoryManager: 2214902769 bytes of memory are used for execution and 87455554 bytes of memory are used for storage
26/01/05 23:05:10 ERROR Executor: Exception in task 5038.0 in stage 185.0 (TID 172030)
org.apache.spark.memory.SparkOutOfMemoryError: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 16384 bytes of memory, got 0.
	at org.apache.spark.errors.SparkCoreErrors$.outOfMemoryError(SparkCoreErrors.scala:467)
	at org.apache.spark.errors.SparkCoreErrors.outOfMemoryError(SparkCoreErrors.scala)
	at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:384)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.allocateMemoryForRecordIfNecessary(UnsafeExternalSorter.java:467)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:487)
	at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.$anonfun$add$2(ExternalAppendOnlyUnsafeRowArray.scala:149)
	at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.$anonfun$add$2$adapted(ExternalAppendOnlyUnsafeRowArray.scala:143)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

Does this PR introduce any user-facing change?

No

How was this patch tested?

add unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Jan 7, 2026
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

JIRA Issue Information

=== Improvement SPARK-54934 ===
Summary: Memory Leak Caused by thread-unsafe UnsafeInMemorySorter.freeMemory
Assignee: None
Status: Open
Affected: ["3.5.1","4.2.0"]


This comment was automatically generated by GitHub Actions

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

The log indicates that two spills were called by the same thread 152. So it doesn't make sense to me that the issue is due to the concurrent threads.

@Ngone51
Copy link
Member

Ngone51 commented Jan 8, 2026

Could you share the complete logs or do you have a reproduciable example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants