[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253
Open
steven-varga wants to merge 4 commits into
Open
[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253steven-varga wants to merge 4 commits into
steven-varga wants to merge 4 commits into
Conversation
…nc_v trait
Refactor the previously-unused `false,false` specialization of
`impl::detail::hid_t<>` so it stands on its own (no private
inheritance) and `= delete`s `operator ::hid_t()` directly. This
gives a far better diagnostic than the old private-inheritance hide:
error: use of deleted function
'h5::impl::detail::hid_t<h5::impl::fd_t, H5Fclose, false, false, 0>
::operator hid_t() const'
The previous hid_t<…,false,false,…> only had an initializer-list
ctor (everything else was inherited privately, so default ctor and
the from-::hid_t ctor were inaccessible). The new spec mirrors the
classic true,true layout: default ctor, ::hid_t-conversion ctor,
init-list ctor, copy/move ctor + assign, dtor. The `handle` field
is public — internal h5cpp code (executor, dispatch lambdas)
reads it without invoking the deleted conversion; user code is
expected to route through h5::write / h5::read / h5::async::*
factories.
Adds parallel `false,false` specs for hdf5::dataset and
hdf5::attribute so async ds_t / at_t / gr_t pick up the same
shape as their classic counterparts (dapl field, [] operator,
attribute = operator).
User-facing aliases land in a new `h5::async` namespace, parallel
to the classic h5:: descriptor surface:
namespace h5::async {
using fd_t = impl::async_hid_t<impl::fd_t, H5Fclose>;
using ds_t = impl::async_did_t<impl::ds_t, H5Dclose>;
using at_t = impl::async_aid_t<impl::at_t, H5Aclose>;
using gr_t = impl::async_aid_t<impl::gr_t, H5Gclose>;
using ob_t = impl::async_hid_t<impl::ob_t, H5Oclose>;
}
Plus `h5::is_async_v<T>` — a trait specialized on the false,false
hid_t shape — for the concept-constrained operation overloads
that land in Phase II PR-B.
Out of scope here:
- h5::async::create / open factories (Phase II 2.2)
- executor FAPL property + executor thread (Phase II 2.2 / 2.3)
- operation overloads that branch on is_async_v<FD> (Phase II PR-B)
Verified locally:
- 50/50 ctest green (classic surface untouched)
- Standalone smoke test: async types are constructible, handle
field accessible, is_async_v<async::*> = true, is_async_v<
classic *> = false
- Standalone fail test: `static_cast<::hid_t>(async_fd)` emits
the deleted-function diagnostic shown above (no other error
spam)
…ies + scaffold tests Builds on the descriptor refactor from 8304dea (Phase II 2.1). Three new headers wire the executor scaffold and one new test file exercises lifecycle + race coverage. # Mechanism h5cpp/H5executor.hpp ───────────────────── `impl::executor_t` — single worker thread per async fd. Same std::mutex + std::queue + doorbell + stoppable_thread_t pattern as #250's worker_pool_t; the only structural differences are (a) one worker instead of N and (b) `submit_and_wait<Fn>(Fn&&)` that blocks the caller on a std::packaged_task future returned by the lambda. Exception inside the submitted callable propagates back to the submitter via `future.get()`. Reentrant safety: if the caller is already on the executor thread, the callable runs inline (the same-thread check is gated on `worker_thread_id_` populated by the loop on entry). h5cpp/H5Pfapl_async.hpp ─────────────────────── Defensive utility — installs an executor slot on a FAPL using the same shared_ptr-in-H5Pinsert2 pattern from #250's worker pool. Kept for any future code path that *can* round-trip the property through a FAPL (HDF5 v2 native async APIs, in-process FAPL inspection, etc.) but not on the primary read path — see below. h5cpp/H5async.hpp ───────────────── User-facing `h5::async::create` and `h5::async::open`. These are the only call sites where the word "async" appears. Each one: 1. resolves or installs a worker_pool_t for compression, 2. constructs an `executor_t` with that pool, 3. calls H5Fcreate / H5Fopen on the user's FAPL, 4. returns `h5::async::fd_t{raw_hid, std::move(exec)}`. Operation overloads (Phase II PR-B) reach the executor via the wrapper's `exec` field, NOT via H5Fget_access_plist on the file id — see the regression test below for the HDF5 1.10.9 reason. # HDF5 limitation that drove the executor-on-wrapper design HDF5 1.10.9's `H5Fget_access_plist` returns a synthetic FAPL reconstructed from standard properties only. Properties installed via `H5Pinsert2` (which is what #250's worker pool slot uses) do NOT survive the round-trip. Verified with both Phase I's `h5cpp_fapl_worker_pool` and Phase II's `h5cpp_fapl_executor` property names — both come back missing. Test `[#252] HDF5 strips user-inserted FAPL properties on H5Fget_access_plist` documents this behavior so future code does not regress to the slot-only pattern. (This also means Phase I's `pt_t::init`, `h5::write`, `h5::read` pool resolution via `H5Fget_access_plist` silently falls back to basic_pipeline_t — the round-trip tests still pass on correctness but the parallel-compression path is never exercised. That's a follow-up issue, not in this PR's scope.) # Refactor on H5Iall.hpp The Phase II 2.1 false,false specialization gains a `std::shared_ptr<h5::impl::executor_t> exec` field. shared_ptr with a type-erased deleter lets the fwd-declared executor_t in H5Iall.hpp be sufficient — the complete type is only needed at make_shared time inside H5async.hpp. Copy/move ctors and assignment operators carry the exec pointer along with the handle, so `h5::async::fd_t fd_b = fd_a;` shares one executor (verified by test). New factory ctor `hid_t(::hid_t, std::shared_ptr<executor_t>)` is the canonical construction path used by h5::async::create / open and by mode-transitive factories in Phase II PR-B. # Tests — test/H5async.cpp 18 cases covering: [#252] traits — is_async_v<> classification, default-construct [#252] static_assert — async types are not is_constructible_v <::hid_t, async_t> while classic types still are [#252 2.2] FAPL executor slot install / resolve / H5Pcopy share [#252 2.3] submit_and_wait — basic, executor-thread-id, void return type, exception propagation, reentry inline [#252 2.4] 8-thread concurrent submission stress (TSAN signal) [#252 2.5] async::create / open round-trip; h5::threads{N} chain; shared_ptr propagation across copy [#252] HDF5 1.10.9 FAPL-strip regression documentation Local C++17 build: 51/51 ctest green. Local TSAN build: test-h5async 18/18 with no races.
…r async scaffold README v1.12 highlights gains a one-line bullet for the async-mode scaffold: `h5::async::fd_t fd = h5::async::create(...)` with the deleted ::hid_t conversion as the safety mechanism. docs/filtering-pipeline-rework-report.md gains a "Status — Phase II PR-A (#252, async descriptors + executor scaffold)" section covering the namespace shape, the mechanism (executor on wrapper, not on FAPL slot — with the HDF5 1.10.9 H5Fget_access_plist limitation called out), and what's deferred to PR-B.
…ait, not worker_loop
CI coverage job (gcov-instrumented, slower than release builds)
hit a benign race in test/H5async.cpp:130 —
CHECK( exec.in_flight() == 0 ) // after exception case
Race timeline:
- submit_and_wait() did in_flight_++ at submission
- worker_loop pulled task, ran it (or caught exception),
then did in_flight_-- AFTER setting the future's value
- submitter's fut.get() unblocked the moment the future was
set, BEFORE the worker reached its fetch_sub
- the in_flight() check on the submitter's thread could see
1 instead of 0 if scheduled between fut.get() returning
and the worker's fetch_sub
Local release builds (and previous CI on TSAN / ASAN / UBSAN)
were fast enough that the worker's fetch_sub almost always
won the race; the coverage job's instrumentation widened the
window enough to expose it deterministically.
Fix: move the decrement from worker_loop to submit_and_wait's
return path via an RAII guard. After the change, in_flight_
is causally tied to the lifetime of `submit_and_wait` on the
submitter's thread, NOT to "task has been dequeued and run by
the worker". Reads of in_flight() immediately after the call
now see 0 deterministically — no spin, no wait_idle, no race.
Mechanism:
template <typename Fn>
auto submit_and_wait(Fn&& fn) {
...
in_flight_.fetch_add(1, ...);
{ lock_guard ... ; tasks_.emplace([task] { (*task)(); }); }
bell_.ring();
struct decrement_on_exit {
std::atomic<int>& counter;
~decrement_on_exit() { counter.fetch_sub(1, ...); }
} guard{in_flight_};
return fut.get(); // blocks; rethrows on exception
}
The RAII guard handles both normal return and exception paths
identically — the future's value (or exception) propagates,
then ~guard runs, then the caller observes both the return /
throw AND the post-decrement state of in_flight_.
worker_loop's fetch_sub is removed; the worker now just runs
the task and continues. wait_idle() semantics are unchanged
because submit_and_wait blocks until fut.get() returns —
in_flight_ stays at the same monotone count it would have had
with the old design.
Verified:
- Local release: 18/18 h5async ctest green
- Local TSAN: 18/18 h5async ctest green, no new races
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First of two PRs implementing Phase II of the multithreading workplan (
tasks/h5cpp-fapl-multithreading-workplan.md§4). PR-A delivers the descriptor types, FAPL executor property, and executor thread. PR-B (follow-up) wires the concept-constrained operation overloads that branch onis_async_v<FD>.Closes #252 (when merged).
Surface
h5::async::*types haveoperator ::hid_t() = deleteso any stray raw HDF5 C-API call on an async descriptor produces a cleanuse of deleted functiondiagnostic at compile time.Mechanism (three new headers)
h5cpp/H5executor.hpp— single worker thread per async fd;submit_and_wait<Fn>(Fn&&)blocks the caller viastd::packaged_task+std::future. Exception propagation, same-thread re-entry inline, MPSC queue + doorbell, stoppable_thread shim. Same shape as feature, FAPL-scoped worker pool with h5::threads{N} for parallel filter compression #250's worker_pool_t with N=1 + the sync facade.h5cpp/H5Pfapl_async.hpp— defensive FAPL slot using the sameH5Pinsert2+ shared_ptr pattern as feature, FAPL-scoped worker pool with h5::threads{N} for parallel filter compression #250's pool. Not on the primary code path (see below) but kept for any future flow that benefits from FAPL-pinned executor lifecycle.h5cpp/H5async.hpp—h5::async::create / open. Each constructs anexecutor_t, thenH5Fcreate/H5Fopenagainst the user's FAPL, then returnsh5::async::fd_t{raw_hid, exec}.Why the executor lives on the wrapper, not the FAPL
HDF5 1.10.9's
H5Fget_access_plistreturns a synthetic FAPL reconstructed from standard properties only —H5Pinsert2-installed properties do not survive the round-trip. A regression test in this PR documents the behavior.Storing the executor
shared_ptrdirectly on thefalse,falsehid_t<>specialization sidesteps the limitation.std::shared_ptr's type-erased deleter lets the wrapper forward-declareexecutor_tinH5Iall.hppand only need the complete type atmake_sharedtime insideH5async.hpp.(Note: the same HDF5 limitation also affects #250's pool resolution path in
pt_t::init / h5::write / h5::read. Data still round-trips correctly via thebasic_pipeline_tfallback, so #250's tests passed — but the parallel-compression path was never exercised. Tracked as a separate follow-up issue.)Tests —
test/H5async.cpp(18 cases)[#252]is_async_vclassification across async/classic types[#252]static_assertthat async types are notis_constructible_v<::hid_t, T>while classic types still are[#252]H5I_UNINIT[#252 2.2]H5Pcopyshared ownership + idempotence[#252 2.3]submit_and_waitbasic + executor-thread-id + void-return + exception propagation + reentry inline[#252 2.4][#252 2.5]async::create / openround-trip;h5::threads{N}chain; copy semantics share executor[#252]Local validation:
-fsanitize=thread): test-h5async 18/18 clean, no races.CI will run the full matrix (asan/ubsan/tsan/coverage + Ubuntu 22/24 × gcc-13/14/clang-17..20 + macOS-15 + Windows MSVC) once this draft is flipped to ready.
Out of scope (lands in PR-B)
h5::write / read / append / flush / create(fd, "ds", …)that branch viaif constexpr (is_async_v<FD>).h5::pt_tas a class templatetemplate <class DS = h5::ds_t>deduced via CTAD.--enable-threadsafe.Coupling