Skip to content

[SPARK-55619][SQL] Fix custom metrics in case of coalesced partitions#54396

Closed
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-55619-fix-coalesced-partitions-custom-metrics
Closed

[SPARK-55619][SQL] Fix custom metrics in case of coalesced partitions#54396
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-55619-fix-coalesced-partitions-custom-metrics

Conversation

@peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds reverts SPARK-52809 / #51503 and offers a new way to track most recently created reader and do a final metrics update in a task completion listener. The changes are required as DataSourceRDD.compute() can be called multiple times in thead/task when the DataSourceRDD gets coalesced. Please find an example in the new test.

Why are the changes needed?

To calculate custom metrics correctly.

Does this PR introduce any user-facing change?

It fixes metrics calculation.

How was this patch tested?

New UT is added.

Was this patch authored or co-authored using generative AI tooling?

No.

@peter-toth
Copy link
Contributor Author

This PR was extracted from #54330 after discussion in #54330 (comment).

cc @szehon-ho, @viirya, @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Since this includes the revert of SPARK-52809, I'll leave this to @viirya .

@viirya
Copy link
Member

viirya commented Feb 20, 2026

Proposed another approach without ThreadLocal: #54399. I added @peter-toth as co-author.

@peter-toth
Copy link
Contributor Author

Thanks @viirya, I'm ok with using ConcurrentHashMap instead of ThreadLocal. Let me close this PR in favour of #54399.

@peter-toth peter-toth closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments