Refactor _SharedCache to handle context vs non-context ownership#38620
Refactor _SharedCache to handle context vs non-context ownership#38620shunping wants to merge 3 commits into
Conversation
Refactors `_SharedCache` to support an `is_context` flag for registered owners. This prevents independent non-context subprocesses (e.g., prism runner and expansion service) from sharing ownership of each other's keys, while allowing context wrappers to properly track all active subprocesses.
0fdbbb3 to
2b18beb
Compare
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a resource leak issue in the subprocess management logic where independent subprocess owners were incorrectly sharing ownership of cached keys. By introducing a distinction between context-based and independent owners, the system now ensures that resources are correctly tracked and cleaned up, preventing zombie processes and port exhaustion. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the _SharedCache implementation in subprocess_server.py to transition _live_owners from a set to a dictionary, allowing the system to distinguish between context and non-context owners. It also updates the get and register methods to handle this ownership model and adds corresponding unit tests. The review feedback correctly identifies a critical race condition and a potential resource leak in the get method, where the _live_owners check is performed outside the lock and the cache entry is created before validating the requesting owner. Implementing the suggested fix to perform these checks inside the lock is highly recommended.
|
assign set of reviewers |
|
Assigning reviewers: R: @jrmccluskey for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Currently,
_SharedCacheregisters all live owners as joint owners of every cached subprocess key. This works fine when all owners are within a unified context (e.g. a test class wrapping multiple runs), but it causes unintended resource sharing/leakage when there are concurrent, independent, non-context owners (such as a long-lived runner).Here is a resource leak scenario:
owner_1.owner_2and requests its startup command key.get()automatically added all currently registered live owners to the cache key's owner set.beam/sdks/python/apache_beam/utils/subprocess_server.py
Lines 117 to 118 in 361c2c4
As a result,
owner_1(the Prism runner) was registered as a joint owner of the short-lived Expansion Service's cache key.owner_2), the short-lived Expansion Service subprocess was never terminated becauseowner_1was still registered as an owner (entry.ownersis not empty).beam/sdks/python/apache_beam/utils/subprocess_server.py
Lines 104 to 105 in 361c2c4
This PR refactors
_SharedCacheinsubprocess_server.pyto explicitly distinguish between context owners (e.g., cache_subprocesses which should hold ownership over everything created under its block) and non-context owners (e.g., individual subprocess server instances like PrismRunner or expansion service subprocesses).Additionally, get() now takes an optional owner arg. Context owners automatically own all created keys, while independent non-context owners only own the keys they explicitly request.
For example, we run YAML example test with
pytest apache_beam/yaml/examples/testing/examples_test.py --log-cli-level=INFO -s.