[SPARK-54929][PYTHON] Fix taskContext._resources reset in loop causes only last resource to be kept #53707
+26
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR fixes a bug in
python/pyspark/worker.pywheretaskContext._resourcesis incorrectly reset inside the for loop, causing only the last resource to be retained when a task has multiple resource types.The bug was introduced in SPARK-28234 (2019) and has been present since then. The existing test only covered the single-resource case, so the bug was not caught.
Why are the changes needed?
When a task has multiple resources (e.g., GPU + FPGA), only the last resource is preserved due to the dictionary being reset on each loop iteration. This breaks the multi-resource functionality that was intended in SPARK-28234.
Does this PR introduce any user-facing change?
Yes. Tasks with multiple resource types will now correctly have all resources available via
TaskContext.get().resources().How was this patch tested?
Modified the existing
TaskContextTestsWithResourcestest class to configure and verify multiple resources (GPU + FPGA).Was this patch authored or co-authored using generative AI tooling?
Yes.