Skip to content

Conversation

@JamesWrigley
Copy link
Member

Quick explanation of what the GC tests for RemoteChannels test does:

  1. Create RemoteChannels rr and fstore on worker 1 and worker 2 respectively from the master process. At this point only the master process knows about rr and fstore.
  2. Master process calls put!(fstore, rr), i.e. we remotecall worker 2 and put rr (which is owned worker 1 but is currently only known about by the master) into fstore.
  3. Remotecall into worker 1 and check that it knows about rr.

Step 3 should succeed despite us never previously explicitly communicating with worker 1, because serialize(::ClusterSerializer, ::RemoteChannel) will send a message to the owner of the RemoteChannel to inform them of its existence (see send_add_client()). This happens asynchronously in step 2, and on rare occasions worker 1 would not process that message before step 3, causing the test to fail.

Now we give the check 10s to succeed.

Cherry-picked from JuliaParallel/DistributedNext.jl#8.

@JamesWrigley JamesWrigley self-assigned this Jan 6, 2025
@codecov
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.24%. Comparing base (3a43532) to head (f9d3d89).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #118      +/-   ##
==========================================
- Coverage   79.33%   79.24%   -0.09%     
==========================================
  Files          10       10              
  Lines        1916     1913       -3     
==========================================
- Hits         1520     1516       -4     
- Misses        396      397       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JamesWrigley
Copy link
Member Author

(bump)

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment? I always find timeouts scary, since it may be able to fail.

Quick explanation of what the `GC tests for RemoteChannels` test does:
1. Create `RemoteChannel`s `rr` and `fstore` on worker 1 and worker 2
   respectively from the master process. At this point only the master process
   knows about `rr` and `fstore`.
2. Master process calls `put!(fstore, rr)`, i.e. we remotecall worker 2 and put
   `rr` (which is owned worker 1 but is currently only known about by the
   master) into `fstore`.
3. Remotecall into worker 1 and check that it knows about `rr`.

Step 3 should succeed despite us never previously explicitly communicating with
worker 1, because `serialize(::ClusterSerializer, ::RemoteChannel)` will send a
message to the owner of the `RemoteChannel` to inform them of its existence (see
`send_add_client()`). This happens asynchronously in step 2, and on rare
occasions worker 1 would not process that message before step 3, causing the
test to fail.

Now we give the check 10s to succeed.
@JamesWrigley
Copy link
Member Author

Sure, added in f9d3d89.

@JamesWrigley JamesWrigley merged commit 3b9e4fd into master Jan 15, 2025
6 checks passed
@JamesWrigley JamesWrigley deleted the racey-tests branch January 15, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants