-
Notifications
You must be signed in to change notification settings - Fork 13
WIP: Fix incomplete Minio downloads when downloading many files concurrently #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #697 +/- ##
==========================================
- Coverage 98.27% 93.92% -4.35%
==========================================
Files 29 27 -2
Lines 2085 2074 -11
==========================================
- Hits 2049 1948 -101
- Misses 36 126 +90
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # Now just wait until all of our tasks complete | ||
| await asyncio.gather(*download_tasks) | ||
| MAX_INFLIGHT = 100 | ||
| if len(download_tasks) >= MAX_INFLIGHT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we already control this via sempahore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of downloads is controlled by a semaphore in the minio_adapter.py, but these new lines control the number of concurrent asyncio tasks . Maybe it's okay to await for thousands of tasks but only downloading small number of tasks. At least this doesn't fix the problem.
| localsize = path.stat().st_size | ||
|
|
||
| # Ensure filesystem flush visibility | ||
| await asyncio.sleep(0.05) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a magic number and might not work in general. Would it be enough to download to the .part file, rename, then stat the new file?
|
Here is an example size difference which leads to a download failure |
Story
MinioAdapterclass (e.g.transferconfigfor aioboto3, concurrency setting (_file_transfer_sem) in thedownload_filefunction) but nothing fixes the issue.servicex transforms download <request-id>works fine.Updates