Skip to content

Fix deadlocks and crashes#87

Merged
bbockelm merged 3 commits intoPelicanPlatform:mainfrom
bbockelm:crash_fixups
Feb 1, 2025
Merged

Fix deadlocks and crashes#87
bbockelm merged 3 commits intoPelicanPlatform:mainfrom
bbockelm:crash_fixups

Conversation

@bbockelm
Copy link
Collaborator

@bbockelm bbockelm commented Feb 1, 2025

This fixes two issues found so far while stress testing:

  1. A crash that occurs if the file is closed while reads were pending.
  2. A deadlock that occurs if the state mutex is held while files are going into queue.

I can no longer get the plugin to crash under Pelican downloads of the test data. However, it still periodically "stalls out" -- not returning any data from AWS -- so there's still something to hunt down.

When XRootD closes the file while the cache was doing readahead,
all the callback objects were deleted -- causing a segfault when
the readahead finished.

This adds a check during the cache destructor to ensure there are
no pending operations.
The mutex protecting the state variables of the cache was held when
the read request was submitted to the queue.  This worked well when
the queue was not at capacity -- but failed when the submission to
the queue blocked.  This is because the mutex is needed by the
callback function to finish operations.

The fix was to drop the mutex while we're entering the queue and to
acquire it again afterward.
@bbockelm
Copy link
Collaborator Author

bbockelm commented Feb 1, 2025

Ok, after this work I'm unable to trigger any crashes in the code -- going to cut a release.

@bbockelm bbockelm merged commit 7fcb352 into PelicanPlatform:main Feb 1, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant