Skip to content

Issue 100: Vol destroy with outstanding IO#99

Merged
yamingk merged 12 commits intoeBay:mainfrom
yamingk:yk_out_io2
Jun 5, 2025
Merged

Issue 100: Vol destroy with outstanding IO#99
yamingk merged 12 commits intoeBay:mainfrom
yamingk:yk_out_io2

Conversation

@yamingk
Copy link
Collaborator

@yamingk yamingk commented May 31, 2025

  1. When volume destroy has started, reject all IOs to this volume.
  2. When volume destroy is being started, if there is outstanding IO that has not yet completed, destroy will wait for all IOs to complete for this volume before destroying in async mode;

Testing:

100 iterations has passed on my local env.

  1. Add fake IO (just increase outstnding counter, without doing actual write) with delay flip in test_volume that it will delay for customized seconds to complete.
  2. Issue volume destroy and we should see it can not proceed and back off, then wait for volume gc thread to kick in to resume destroy when outstanding IOs are completed.
[ RUN      ] VolumeTest.CreateDestroyVolumeWithOutstandingIO
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [volume_mgr.cpp:71:create_volume] [vol=027dfecd-43c2-4c1b-9f22-8112f20cd9d0] is of capacity [1073741824B]
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [volume.cpp:68:init] Creating solo repl dev for volume: vol_0, uuid: 027dfecd-43c2-4c1b-9f22-8112f20cd9d0
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [log_store_service.cpp:198:create_new_logdev_internal] Created logdev 1
[06/02/25 15:36:41-07:00] [I] [test_volume] [220728] [iomgr_timer.cpp:179:setup_timer_fd] Creating recurring per-thread timer fd 46 and adding it into fd poll list
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [volume.cpp:40:init_index_table] Creating index table for volume: vol_0, index_uuid: 45b87b20-a2b7-4bd1-a238-6fc2e9656231, parent_uuid: 027dfecd-43c2-4c1b-9f22-8112f20cd9d0
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [volume_mgr.cpp:273:delay_fake_io] Slow down vol fake IO flip is enabled, scheduling to call later.
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [volume_mgr.cpp:273:delay_fake_io] Slow down vol fake IO flip is enabled, scheduling to call later.
[06/02/25 15:36:41-07:00] [I] [test_volume] [220681] [test_volume.cpp:87:TestBody] Stats: total_capacity_bytes=8589934592, used_capacity_bytes=0, drive_type: block_hdd
...
[06/02/25 15:36:46-07:00] [I] [test_volume] [220681] [test_volume.cpp:95:TestBody] Volume 027dfecd-43c2-4c1b-9f22-8112f20cd9d0 removed, waiting for 1 seconds for IO to complete
[06/02/25 15:36:46-07:00] [I] [test_volume] [220685] [volume_mgr.cpp:95:operator()] remove_volume with input id: 027dfecd-43c2-4c1b-9f22-8112f20cd9d0
[06/02/25 15:36:47-07:00] [I] [test_volume] [220685] [volume_mgr.cpp:270:operator()] Resuming fake IO delay flip is done. Do nothing
[06/02/25 15:36:47-07:00] [I] [test_volume] [220685] [volume_mgr.cpp:270:operator()] Resuming fake IO delay flip is done. Do nothing

@yamingk yamingk requested review from raakella1 and sanebay May 31, 2025 02:39
@yamingk yamingk linked an issue May 31, 2025 that may be closed by this pull request
@yamingk yamingk changed the title Vol destroy with outstanding IO Issue 100: Vol destroy with outstanding IO May 31, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 78.89908% with 23 lines in your changes missing coverage. Please review.

Project coverage is 80.49%. Comparing base (0b40663) to head (ebad59c).
Report is 26 commits behind head on main.

Files with missing lines Patch % Lines
src/lib/volume_mgr.cpp 51.35% 11 Missing and 7 partials ⚠️
src/lib/homeblks_impl.cpp 86.95% 0 Missing and 3 partials ⚠️
src/lib/volume/tests/test_volume.cpp 96.66% 0 Missing and 1 partial ⚠️
src/lib/volume/volume.hpp 75.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #99       +/-   ##
===========================================
+ Coverage   61.68%   80.49%   +18.80%     
===========================================
  Files          15       17        +2     
  Lines         462     1102      +640     
  Branches       35      121       +86     
===========================================
+ Hits          285      887      +602     
+ Misses        158      153        -5     
- Partials       19       62       +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}

void HomeBlocksImpl::start_reaper_thread() {
auto const nsecs = gc_timer_secs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nsecs/secs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

superblk< vol_sb_t > sb_; // meta data of the volume

sisl::atomic_counter< uint64_t > outstanding_reqs_{0}; // number of outstanding requests
bool destroy_started_{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use enum instead vol_state::destroying ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually different states, so when we are rebooting, the state will be vol_state::destroying, but destroy has never been kicked off yet after reboot, that is why I introduced this field.

}
}
// Volume Destructor will be called after vol_ptr goes out of scope;
return folly::Unit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that the caller cant know the status of the remove_volume ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is intentional since it is async and lazy free of the blocks. Do you see any problem with this approach?

}

if (vol_ptr) {
vol_ptr->state_change(vol_state::DESTROYING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could also move this and check can_remove inside the vol_ptr->destroy too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this actually, then I think it would be better to have them in different APIs for better contract between the callers and volume layers and decided to keep it this way.

@yamingk yamingk merged commit db2d4d7 into eBay:main Jun 5, 2025
34 of 37 checks passed
@yamingk yamingk deleted the yk_out_io2 branch June 5, 2025 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Volume destroy with outsanding IOs

3 participants