-
Notifications
You must be signed in to change notification settings - Fork 187
refactor(api,robot-server): Upgrade anyio 3.7.1 -> 4.9.0 #19071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this even if it doesn't fix the issues all on its own - it is a good idea.
Resolve conflicts in: * api/Pipfile.lock * robot-server/Pipfile.lock * system-server/Pipfile * system-server/Pipfile.lock
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## edge #19071 +/- ##
=======================================
Coverage 24.89% 24.89%
=======================================
Files 3371 3371
Lines 296350 296350
Branches 31444 31444
=======================================
Hits 73773 73773
Misses 222553 222553
Partials 24 24
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Uhh will this fix snapshot tests?
This comment was marked as resolved.
This comment was marked as resolved.
…col runs (#19108) Closes [RQA-3917](https://opentrons.atlassian.net/browse/RQA-3917) <!-- Thanks for taking the time to open a Pull Request (PR)! Please make sure you've read the "Opening Pull Requests" section of our Contributing Guide: https://github.com/Opentrons/opentrons/blob/edge/CONTRIBUTING.md#opening-pull-requests GitHub provides robust markdown to format your PR. Links, diagrams, pictures, and videos along with text formatting make it possible to create a rich and informative PR. For more information on GitHub markdown, see: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax To ensure your code is reviewed quickly and thoroughly, please fill out the sections below to the best of your ability! --> # Overview Although protocol engine is eventually dereferenced during the protocol run lifecycle, there exists a circular reference between command history and additional objects that isn't well captured by the gc lib or memray call stack analysis. Fully clearing the run's `CommandHistory` before dereferencing the run orchestrator eliminates a memory leak. <!-- Describe your PR at a high level. State acceptance criteria and how this PR fits into other work. Link issues, PRs, and other relevant resources. --> ## Test Plan and Hands on Testing - See ticket for the wonderful script made by @SyntaxColoring, which was modified to run a step-intensive protocol 25 times in a simulated environment. The following outputs were generated with memray, passing no additional flags when generating the bin file. To convert the bin file to an HTML file, the following was run: `memray flamegraph --split-threads --temporal /path/to/file`. ### Twenty-Five Simulated Protocol Runs (with PR only) <img width="1451" height="411" alt="Screenshot 2025-08-01 at 4 20 37 PM" src="https://github.com/user-attachments/assets/577ea481-8106-43e9-9ff8-8f41e3dbcae3" /> Note the total memory usage (which may vary by robot). Compare with `edge` output, below. We expect to see some increase in total heap usage up to a point as various caching occurs. After run 17, there is no more apparent memory increase. ### Twenty-Five Simulated Protocol Runs (`edge` prior to any recent memory fixes, without PR) <img width="1475" height="380" alt="Screenshot 2025-08-01 at 10 48 33 PM" src="https://github.com/user-attachments/assets/95c8b516-011e-4f8c-a890-505ad57bd1e2" /> Note that after the 25th run, total `opentrons-robot-server` heap allocation is substantially greater than the above case. ### Twenty-Five Simulated Protocol Runs (with PR), No LRU Caching, #19107 Cherry-Pick Included <img width="1464" height="358" alt="Screenshot 2025-08-01 at 4 28 56 PM" src="https://github.com/user-attachments/assets/df9ad20c-ddfd-48c4-8b15-f3f350d68a5a" /> Effectively no increase in memory utilization after initialization and the completion of the second protocol run. ### Two Real Protocol Runs (with PR), No LRU Caching Included <img width="1459" height="370" alt="Screenshot 2025-08-01 at 4 30 29 PM" src="https://github.com/user-attachments/assets/da3f2fa8-294e-4639-a551-423e86ed375f" /> The various spikes during the run are because of camera captures via HTTP. ### Six Real Protocol Runs (with PR, #19107, #19110, #19109, #19071) <img width="1458" height="361" alt="Screenshot 2025-08-01 at 11 03 41 PM" src="https://github.com/user-attachments/assets/5bd18f3f-4a78-4540-8502-b5ec0a5b2e9f" /> Run between 10-40 minutes. The end of run memory for run 2 is 504MB, which is equivalent to the end of run 6 memory. The memray HTML analysis file is too large to attach directly on github, but it's included in the ticket. <!-- Describe your testing of the PR. Emphasize testing not reflected in the code. Attach protocols, logs, screenshots and any other assets that support your testing. --> ## Changelog - Fixed command history accumulating in memory across protocol runs. <!-- List changes introduced by this PR considering future developers and the end user. Give careful thought and clear documentation to breaking changes. --> <!-- - What do you need from reviewers to feel confident this PR is ready to merge? - Ask questions. --> ## Risk assessment low - we are clearing state exactly before we dereference the run orchestrator, at which point we don't expect this state to be available, anyway. <!-- - Indicate the level of attention this PR needs. - Provide context to guide reviewers. - Discuss trade-offs, coupling, and side effects. - Look for the possibility, even if you think it's small, that your change may affect some other part of the system. - For instance, changing return tip behavior may also change the behavior of labware calibration. - How do your unit tests and on hands on testing mitigate this PR's risks and the risk of future regressions? - Especially in high risk PRs, explain how you know your testing is enough. --> [RQA-3917]: https://opentrons.atlassian.net/browse/RQA-3917?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: Max Marrone <[email protected]> Co-authored-by: Seth Foster <[email protected]>
Overview
This upgrades our dependency on
anyio
in all Python code that runs on robots.The latest release, 4.9.0 has a memory usage fix that may be important to us.
anyio.to_thread.run_sync()
, a helper that we use a lot, was needlessly retaining references to certain objects, preventing them from being garbage-collected.This alone doesn't seem to fix our memory usage problems, but it's probably a necessary component of the fix.
Test Plan and Hands on Testing
Just CI.
Risk assessment
Medium.
There is one breaking change that is relevant to us, relating to how exceptions are raised out of task groups. I think this affects at least this call site:
opentrons/robot-server/robot_server/protocols/router.py
Lines 395 to 404 in e493012
If it matters, we'll address it in a separate PR.