[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold by steven-varga · Pull Request #253 · vargalabs/h5cpp

steven-varga · 2026-05-18T14:52:12Z

First of two PRs implementing Phase II of the multithreading workplan (tasks/h5cpp-fapl-multithreading-workplan.md §4). PR-A delivers the descriptor types, FAPL executor property, and executor thread. PR-B (follow-up) wires the concept-constrained operation overloads that branch on is_async_v<FD>.

Closes #252 (when merged).

Surface

namespace h5::async {
    using fd_t   = impl::async_hid_t<impl::fd_t,  H5Fclose>;
    using ds_t   = impl::async_did_t<impl::ds_t,  H5Dclose>;
    using gr_t   = impl::async_aid_t<impl::gr_t,  H5Gclose>;
    using at_t   = impl::async_aid_t<impl::at_t,  H5Aclose>;

    fd_t create(const std::string& path, unsigned flags,
                const h5::fcpl_t& fcpl = h5::default_fcpl,
                const h5::fapl_t& fapl = h5::default_fapl);
    fd_t open  (const std::string& path, unsigned flags,
                const h5::fapl_t& fapl = h5::default_fapl);
}

h5::async::* types have operator ::hid_t() = delete so any stray raw HDF5 C-API call on an async descriptor produces a clean use of deleted function diagnostic at compile time.

Mechanism (three new headers)

h5cpp/H5executor.hpp — single worker thread per async fd; submit_and_wait<Fn>(Fn&&) blocks the caller via std::packaged_task + std::future. Exception propagation, same-thread re-entry inline, MPSC queue + doorbell, stoppable_thread shim. Same shape as feature, FAPL-scoped worker pool with h5::threads{N} for parallel filter compression #250's worker_pool_t with N=1 + the sync facade.
h5cpp/H5Pfapl_async.hpp — defensive FAPL slot using the same H5Pinsert2 + shared_ptr pattern as feature, FAPL-scoped worker pool with h5::threads{N} for parallel filter compression #250's pool. Not on the primary code path (see below) but kept for any future flow that benefits from FAPL-pinned executor lifecycle.
h5cpp/H5async.hpp — h5::async::create / open. Each constructs an executor_t, then H5Fcreate / H5Fopen against the user's FAPL, then returns h5::async::fd_t{raw_hid, exec}.

Why the executor lives on the wrapper, not the FAPL

HDF5 1.10.9's H5Fget_access_plist returns a synthetic FAPL reconstructed from standard properties only — H5Pinsert2-installed properties do not survive the round-trip. A regression test in this PR documents the behavior.

Storing the executor shared_ptr directly on the false,false hid_t<> specialization sidesteps the limitation. std::shared_ptr's type-erased deleter lets the wrapper forward-declare executor_t in H5Iall.hpp and only need the complete type at make_shared time inside H5async.hpp.

(Note: the same HDF5 limitation also affects #250's pool resolution path in pt_t::init / h5::write / h5::read. Data still round-trips correctly via the basic_pipeline_t fallback, so #250's tests passed — but the parallel-compression path was never exercised. Tracked as a separate follow-up issue.)

Tests — `test/H5async.cpp` (18 cases)

Tag	Coverage
`[#252]`	`is_async_v` classification across async/classic types
`[#252]`	`static_assert` that async types are not `is_constructible_v<::hid_t, T>` while classic types still are
`[#252]`	default-construct yields `H5I_UNINIT`
`[#252 2.2]`	FAPL slot install + resolve + `H5Pcopy` shared ownership + idempotence
`[#252 2.3]`	`submit_and_wait` basic + executor-thread-id + void-return + exception propagation + reentry inline
`[#252 2.4]`	8-thread × 32-task concurrent submission stress
`[#252 2.5]`	`async::create / open` round-trip; `h5::threads{N}` chain; copy semantics share executor
`[#252]`	HDF5 1.10.9 FAPL-strip regression documentation

Local validation:

C++17 build: 51/51 ctest green.
TSAN build (-fsanitize=thread): test-h5async 18/18 clean, no races.

CI will run the full matrix (asan/ubsan/tsan/coverage + Ubuntu 22/24 × gcc-13/14/clang-17..20 + macOS-15 + Windows MSVC) once this draft is flipped to ready.

Out of scope (lands in PR-B)

Concept-constrained overloads of h5::write / read / append / flush / create(fd, "ds", …) that branch via if constexpr (is_async_v<FD>).
Mode-transitive factory pattern (async fd → async ds → async at).
h5::pt_t as a class template template <class DS = h5::ds_t> deduced via CTAD.
Performance benchmarks vs. HDF5 --enable-threadsafe.

Coupling

Builds on feature, FAPL-scoped worker pool with h5::threads{N} for parallel filter compression #250 (FAPL pool, slot pattern, worker_pool_t, stoppable_thread_t shim).
Unblocks PR-B (operation overloads).

…nc_v trait Refactor the previously-unused `false,false` specialization of `impl::detail::hid_t<>` so it stands on its own (no private inheritance) and `= delete`s `operator ::hid_t()` directly. This gives a far better diagnostic than the old private-inheritance hide: error: use of deleted function 'h5::impl::detail::hid_t<h5::impl::fd_t, H5Fclose, false, false, 0> ::operator hid_t() const' The previous hid_t<…,false,false,…> only had an initializer-list ctor (everything else was inherited privately, so default ctor and the from-::hid_t ctor were inaccessible). The new spec mirrors the classic true,true layout: default ctor, ::hid_t-conversion ctor, init-list ctor, copy/move ctor + assign, dtor. The `handle` field is public — internal h5cpp code (executor, dispatch lambdas) reads it without invoking the deleted conversion; user code is expected to route through h5::write / h5::read / h5::async::* factories. Adds parallel `false,false` specs for hdf5::dataset and hdf5::attribute so async ds_t / at_t / gr_t pick up the same shape as their classic counterparts (dapl field, [] operator, attribute = operator). User-facing aliases land in a new `h5::async` namespace, parallel to the classic h5:: descriptor surface: namespace h5::async { using fd_t = impl::async_hid_t<impl::fd_t, H5Fclose>; using ds_t = impl::async_did_t<impl::ds_t, H5Dclose>; using at_t = impl::async_aid_t<impl::at_t, H5Aclose>; using gr_t = impl::async_aid_t<impl::gr_t, H5Gclose>; using ob_t = impl::async_hid_t<impl::ob_t, H5Oclose>; } Plus `h5::is_async_v<T>` — a trait specialized on the false,false hid_t shape — for the concept-constrained operation overloads that land in Phase II PR-B. Out of scope here: - h5::async::create / open factories (Phase II 2.2) - executor FAPL property + executor thread (Phase II 2.2 / 2.3) - operation overloads that branch on is_async_v<FD> (Phase II PR-B) Verified locally: - 50/50 ctest green (classic surface untouched) - Standalone smoke test: async types are constructible, handle field accessible, is_async_v<async::*> = true, is_async_v< classic *> = false - Standalone fail test: `static_cast<::hid_t>(async_fd)` emits the deleted-function diagnostic shown above (no other error spam)

…ies + scaffold tests Builds on the descriptor refactor from 8304dea (Phase II 2.1). Three new headers wire the executor scaffold and one new test file exercises lifecycle + race coverage. # Mechanism h5cpp/H5executor.hpp ───────────────────── `impl::executor_t` — single worker thread per async fd. Same std::mutex + std::queue + doorbell + stoppable_thread_t pattern as #250's worker_pool_t; the only structural differences are (a) one worker instead of N and (b) `submit_and_wait<Fn>(Fn&&)` that blocks the caller on a std::packaged_task future returned by the lambda. Exception inside the submitted callable propagates back to the submitter via `future.get()`. Reentrant safety: if the caller is already on the executor thread, the callable runs inline (the same-thread check is gated on `worker_thread_id_` populated by the loop on entry). h5cpp/H5Pfapl_async.hpp ─────────────────────── Defensive utility — installs an executor slot on a FAPL using the same shared_ptr-in-H5Pinsert2 pattern from #250's worker pool. Kept for any future code path that *can* round-trip the property through a FAPL (HDF5 v2 native async APIs, in-process FAPL inspection, etc.) but not on the primary read path — see below. h5cpp/H5async.hpp ───────────────── User-facing `h5::async::create` and `h5::async::open`. These are the only call sites where the word "async" appears. Each one: 1. resolves or installs a worker_pool_t for compression, 2. constructs an `executor_t` with that pool, 3. calls H5Fcreate / H5Fopen on the user's FAPL, 4. returns `h5::async::fd_t{raw_hid, std::move(exec)}`. Operation overloads (Phase II PR-B) reach the executor via the wrapper's `exec` field, NOT via H5Fget_access_plist on the file id — see the regression test below for the HDF5 1.10.9 reason. # HDF5 limitation that drove the executor-on-wrapper design HDF5 1.10.9's `H5Fget_access_plist` returns a synthetic FAPL reconstructed from standard properties only. Properties installed via `H5Pinsert2` (which is what #250's worker pool slot uses) do NOT survive the round-trip. Verified with both Phase I's `h5cpp_fapl_worker_pool` and Phase II's `h5cpp_fapl_executor` property names — both come back missing. Test `[#252] HDF5 strips user-inserted FAPL properties on H5Fget_access_plist` documents this behavior so future code does not regress to the slot-only pattern. (This also means Phase I's `pt_t::init`, `h5::write`, `h5::read` pool resolution via `H5Fget_access_plist` silently falls back to basic_pipeline_t — the round-trip tests still pass on correctness but the parallel-compression path is never exercised. That's a follow-up issue, not in this PR's scope.) # Refactor on H5Iall.hpp The Phase II 2.1 false,false specialization gains a `std::shared_ptr<h5::impl::executor_t> exec` field. shared_ptr with a type-erased deleter lets the fwd-declared executor_t in H5Iall.hpp be sufficient — the complete type is only needed at make_shared time inside H5async.hpp. Copy/move ctors and assignment operators carry the exec pointer along with the handle, so `h5::async::fd_t fd_b = fd_a;` shares one executor (verified by test). New factory ctor `hid_t(::hid_t, std::shared_ptr<executor_t>)` is the canonical construction path used by h5::async::create / open and by mode-transitive factories in Phase II PR-B. # Tests — test/H5async.cpp 18 cases covering: [#252] traits — is_async_v<> classification, default-construct [#252] static_assert — async types are not is_constructible_v <::hid_t, async_t> while classic types still are [#252 2.2] FAPL executor slot install / resolve / H5Pcopy share [#252 2.3] submit_and_wait — basic, executor-thread-id, void return type, exception propagation, reentry inline [#252 2.4] 8-thread concurrent submission stress (TSAN signal) [#252 2.5] async::create / open round-trip; h5::threads{N} chain; shared_ptr propagation across copy [#252] HDF5 1.10.9 FAPL-strip regression documentation Local C++17 build: 51/51 ctest green. Local TSAN build: test-h5async 18/18 with no races.

…r async scaffold README v1.12 highlights gains a one-line bullet for the async-mode scaffold: `h5::async::fd_t fd = h5::async::create(...)` with the deleted ::hid_t conversion as the safety mechanism. docs/filtering-pipeline-rework-report.md gains a "Status — Phase II PR-A (#252, async descriptors + executor scaffold)" section covering the namespace shape, the mechanism (executor on wrapper, not on FAPL slot — with the HDF5 1.10.9 H5Fget_access_plist limitation called out), and what's deferred to PR-B.

…ait, not worker_loop CI coverage job (gcov-instrumented, slower than release builds) hit a benign race in test/H5async.cpp:130 — CHECK( exec.in_flight() == 0 ) // after exception case Race timeline: - submit_and_wait() did in_flight_++ at submission - worker_loop pulled task, ran it (or caught exception), then did in_flight_-- AFTER setting the future's value - submitter's fut.get() unblocked the moment the future was set, BEFORE the worker reached its fetch_sub - the in_flight() check on the submitter's thread could see 1 instead of 0 if scheduled between fut.get() returning and the worker's fetch_sub Local release builds (and previous CI on TSAN / ASAN / UBSAN) were fast enough that the worker's fetch_sub almost always won the race; the coverage job's instrumentation widened the window enough to expose it deterministically. Fix: move the decrement from worker_loop to submit_and_wait's return path via an RAII guard. After the change, in_flight_ is causally tied to the lifetime of `submit_and_wait` on the submitter's thread, NOT to "task has been dequeued and run by the worker". Reads of in_flight() immediately after the call now see 0 deterministically — no spin, no wait_idle, no race. Mechanism: template <typename Fn> auto submit_and_wait(Fn&& fn) { ... in_flight_.fetch_add(1, ...); { lock_guard ... ; tasks_.emplace([task] { (*task)(); }); } bell_.ring(); struct decrement_on_exit { std::atomic<int>& counter; ~decrement_on_exit() { counter.fetch_sub(1, ...); } } guard{in_flight_}; return fut.get(); // blocks; rethrows on exception } The RAII guard handles both normal return and exception paths identically — the future's value (or exception) propagates, then ~guard runs, then the caller observes both the return / throw AND the post-decrement state of in_flight_. worker_loop's fetch_sub is removed; the worker now just runs the task and continues. wait_idle() semantics are unchanged because submit_and_wait blocks until fut.get() returns — in_flight_ stays at the same monotone count it would have had with the old design. Verified: - Local release: 18/18 h5async ctest green - Local TSAN: 18/18 h5async ctest green, no new races

codecov · 2026-05-18T16:47:57Z

Codecov Report

❌ Patch coverage is 96.19048% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
h5cpp/H5Iall.hpp	85.00%	3 Missing ⚠️
h5cpp/H5executor.hpp	97.72%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

steven-varga added 4 commits May 18, 2026 14:20

steven-varga marked this pull request as ready for review May 18, 2026 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253

[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253
steven-varga wants to merge 4 commits into
stagingfrom
252-feature-async-descriptors-executor-scaffold

steven-varga commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steven-varga commented May 18, 2026

Surface

Mechanism (three new headers)

Why the executor lives on the wrapper, not the FAPL

Tests — test/H5async.cpp (18 cases)

Out of scope (lands in PR-B)

Coupling

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tests — `test/H5async.cpp` (18 cases)