Skip to content

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Jul 29, 2025

Overview

This upgrades our dependency on anyio in all Python code that runs on robots.

The latest release, 4.9.0 has a memory usage fix that may be important to us. anyio.to_thread.run_sync(), a helper that we use a lot, was needlessly retaining references to certain objects, preventing them from being garbage-collected.

This alone doesn't seem to fix our memory usage problems, but it's probably a necessary component of the fix.

Test Plan and Hands on Testing

Just CI.

Risk assessment

Medium.

There is one breaking change that is relevant to us, relating to how exceptions are raised out of task groups. I think this affects at least this call site:

try:
source = await protocol_reader.save(
files=buffered_files,
directory=protocol_directory / protocol_id,
content_hash=content_hash,
)
except ProtocolFilesInvalidError as e:
raise ProtocolFilesInvalid(detail=str(e)).as_error(
status.HTTP_422_UNPROCESSABLE_ENTITY
) from e

If it matters, we'll address it in a separate PR.

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this even if it doesn't fix the issues all on its own - it is a good idea.

@sfoster1
Copy link
Member

Opentrons/buildroot#251

Resolve conflicts in:
* api/Pipfile.lock
* robot-server/Pipfile.lock
* system-server/Pipfile
* system-server/Pipfile.lock
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 24.89%. Comparing base (eddea9e) to head (4cf6789).
⚠️ Report is 3 commits behind head on edge.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             edge   #19071   +/-   ##
=======================================
  Coverage   24.89%   24.89%           
=======================================
  Files        3371     3371           
  Lines      296350   296350           
  Branches    31444    31444           
=======================================
  Hits        73773    73773           
  Misses     222553   222553           
  Partials       24       24           
Flag Coverage Δ
protocol-designer 18.91% <ø> (ø)
step-generation 5.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SyntaxColoring SyntaxColoring marked this pull request as ready for review July 31, 2025 16:58
Uhh will this fix snapshot tests?
@SyntaxColoring SyntaxColoring added the gen-analyses-snapshot-pr Generate a healing PR if the analyses snapshot test fails label Jul 31, 2025

This comment was marked as resolved.

@SyntaxColoring SyntaxColoring removed the gen-analyses-snapshot-pr Generate a healing PR if the analyses snapshot test fails label Jul 31, 2025
@SyntaxColoring SyntaxColoring merged commit d1663b7 into edge Aug 1, 2025
50 checks passed
@SyntaxColoring SyntaxColoring deleted the anyio_upgrades branch August 1, 2025 12:49
sfoster1 added a commit that referenced this pull request Aug 4, 2025
…col runs (#19108)

Closes [RQA-3917](https://opentrons.atlassian.net/browse/RQA-3917)

<!--
Thanks for taking the time to open a Pull Request (PR)! Please make sure
you've read the "Opening Pull Requests" section of our Contributing
Guide:


https://github.com/Opentrons/opentrons/blob/edge/CONTRIBUTING.md#opening-pull-requests

GitHub provides robust markdown to format your PR. Links, diagrams,
pictures, and videos along with text formatting make it possible to
create a rich and informative PR. For more information on GitHub
markdown, see:


https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax

To ensure your code is reviewed quickly and thoroughly, please fill out
the sections below to the best of your ability!
-->

# Overview

Although protocol engine is eventually dereferenced during the protocol
run lifecycle, there exists a circular reference between command history
and additional objects that isn't well captured by the gc lib or memray
call stack analysis. Fully clearing the run's `CommandHistory` before
dereferencing the run orchestrator eliminates a memory leak.

<!--
Describe your PR at a high level. State acceptance criteria and how this
PR fits into other work. Link issues, PRs, and other relevant resources.
-->

## Test Plan and Hands on Testing
- See ticket for the wonderful script made by @SyntaxColoring, which was
modified to run a step-intensive protocol 25 times in a simulated
environment.

The following outputs were generated with memray, passing no additional
flags when generating the bin file. To convert the bin file to an HTML
file, the following was run: `memray flamegraph --split-threads
--temporal /path/to/file`.

### Twenty-Five Simulated Protocol Runs (with PR only)

<img width="1451" height="411" alt="Screenshot 2025-08-01 at 4 20 37 PM"
src="https://github.com/user-attachments/assets/577ea481-8106-43e9-9ff8-8f41e3dbcae3"
/>

Note the total memory usage (which may vary by robot). Compare with
`edge` output, below. We expect to see some increase in total heap usage
up to a point as various caching occurs. After run 17, there is no more
apparent memory increase.

### Twenty-Five Simulated Protocol Runs (`edge` prior to any recent
memory fixes, without PR)

<img width="1475" height="380" alt="Screenshot 2025-08-01 at 10 48
33 PM"
src="https://github.com/user-attachments/assets/95c8b516-011e-4f8c-a890-505ad57bd1e2"
/>

Note that after the 25th run, total `opentrons-robot-server` heap
allocation is substantially greater than the above case.

### Twenty-Five Simulated Protocol Runs (with PR), No LRU Caching,
#19107 Cherry-Pick Included

<img width="1464" height="358" alt="Screenshot 2025-08-01 at 4 28 56 PM"
src="https://github.com/user-attachments/assets/df9ad20c-ddfd-48c4-8b15-f3f350d68a5a"
/>

Effectively no increase in memory utilization after initialization and
the completion of the second protocol run.

### Two Real Protocol Runs (with PR), No LRU Caching Included

<img width="1459" height="370" alt="Screenshot 2025-08-01 at 4 30 29 PM"
src="https://github.com/user-attachments/assets/da3f2fa8-294e-4639-a551-423e86ed375f"
/>

The various spikes during the run are because of camera captures via
HTTP.

### Six Real Protocol Runs (with PR, #19107, #19110, #19109, #19071)

<img width="1458" height="361" alt="Screenshot 2025-08-01 at 11 03
41 PM"
src="https://github.com/user-attachments/assets/5bd18f3f-4a78-4540-8502-b5ec0a5b2e9f"
/>

Run between 10-40 minutes. The end of run memory for run 2 is 504MB,
which is equivalent to the end of run 6 memory. The memray HTML analysis
file is too large to attach directly on github, but it's included in the
ticket.

<!--
Describe your testing of the PR. Emphasize testing not reflected in the
code. Attach protocols, logs, screenshots and any other assets that
support your testing.
-->

## Changelog
- Fixed command history accumulating in memory across protocol runs.
<!--
List changes introduced by this PR considering future developers and the
end user. Give careful thought and clear documentation to breaking
changes.
-->

<!--
- What do you need from reviewers to feel confident this PR is ready to
merge?
- Ask questions.
-->

## Risk assessment
low - we are clearing state exactly before we dereference the run
orchestrator, at which point we don't expect this state to be available,
anyway.
<!--
- Indicate the level of attention this PR needs.
- Provide context to guide reviewers.
- Discuss trade-offs, coupling, and side effects.
- Look for the possibility, even if you think it's small, that your
change may affect some other part of the system.
- For instance, changing return tip behavior may also change the
behavior of labware calibration.
- How do your unit tests and on hands on testing mitigate this PR's
risks and the risk of future regressions?
- Especially in high risk PRs, explain how you know your testing is
enough.
-->


[RQA-3917]:
https://opentrons.atlassian.net/browse/RQA-3917?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

---------

Co-authored-by: Max Marrone <[email protected]>
Co-authored-by: Seth Foster <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants