Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a bug in python/pyspark/worker.py where taskContext._resources is incorrectly reset inside the for loop, causing only the last resource to be retained when a task has multiple resource types.

The bug was introduced in SPARK-28234 (2019) and has been present since then. The existing test only covered the single-resource case, so the bug was not caught.

Why are the changes needed?

When a task has multiple resources (e.g., GPU + FPGA), only the last resource is preserved due to the dictionary being reset on each loop iteration. This breaks the multi-resource functionality that was intended in SPARK-28234.

Does this PR introduce any user-facing change?

Yes. Tasks with multiple resource types will now correctly have all resources available via TaskContext.get().resources().

How was this patch tested?

Modified the existing TaskContextTestsWithResources test class to configure and verify multiple resources (GPU + FPGA).

Was this patch authored or co-authored using generative AI tooling?

Yes.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

JIRA Issue Information

=== Bug SPARK-54929 ===
Summary: taskContext._resources reset in loop causes only last resource to be kept
Assignee: None
Status: Open
Affected: ["4.1.0"]


This comment was automatically generated by GitHub Actions

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tgravescs Would be great to take a very quick look .. but LGTM

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants