Skip to content

[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253

Open
steven-varga wants to merge 4 commits into
stagingfrom
252-feature-async-descriptors-executor-scaffold
Open

[#252]:svarga:feature, async-mode descriptors + FAPL executor scaffold#253
steven-varga wants to merge 4 commits into
stagingfrom
252-feature-async-descriptors-executor-scaffold

Conversation

@steven-varga
Copy link
Copy Markdown
Collaborator

First of two PRs implementing Phase II of the multithreading workplan (tasks/h5cpp-fapl-multithreading-workplan.md §4). PR-A delivers the descriptor types, FAPL executor property, and executor thread. PR-B (follow-up) wires the concept-constrained operation overloads that branch on is_async_v<FD>.

Closes #252 (when merged).

Surface

namespace h5::async {
    using fd_t   = impl::async_hid_t<impl::fd_t,  H5Fclose>;
    using ds_t   = impl::async_did_t<impl::ds_t,  H5Dclose>;
    using gr_t   = impl::async_aid_t<impl::gr_t,  H5Gclose>;
    using at_t   = impl::async_aid_t<impl::at_t,  H5Aclose>;

    fd_t create(const std::string& path, unsigned flags,
                const h5::fcpl_t& fcpl = h5::default_fcpl,
                const h5::fapl_t& fapl = h5::default_fapl);
    fd_t open  (const std::string& path, unsigned flags,
                const h5::fapl_t& fapl = h5::default_fapl);
}

h5::async::* types have operator ::hid_t() = delete so any stray raw HDF5 C-API call on an async descriptor produces a clean use of deleted function diagnostic at compile time.

Mechanism (three new headers)

Why the executor lives on the wrapper, not the FAPL

HDF5 1.10.9's H5Fget_access_plist returns a synthetic FAPL reconstructed from standard properties only — H5Pinsert2-installed properties do not survive the round-trip. A regression test in this PR documents the behavior.

Storing the executor shared_ptr directly on the false,false hid_t<> specialization sidesteps the limitation. std::shared_ptr's type-erased deleter lets the wrapper forward-declare executor_t in H5Iall.hpp and only need the complete type at make_shared time inside H5async.hpp.

(Note: the same HDF5 limitation also affects #250's pool resolution path in pt_t::init / h5::write / h5::read. Data still round-trips correctly via the basic_pipeline_t fallback, so #250's tests passed — but the parallel-compression path was never exercised. Tracked as a separate follow-up issue.)

Tests — test/H5async.cpp (18 cases)

Tag Coverage
[#252] is_async_v classification across async/classic types
[#252] static_assert that async types are not is_constructible_v<::hid_t, T> while classic types still are
[#252] default-construct yields H5I_UNINIT
[#252 2.2] FAPL slot install + resolve + H5Pcopy shared ownership + idempotence
[#252 2.3] submit_and_wait basic + executor-thread-id + void-return + exception propagation + reentry inline
[#252 2.4] 8-thread × 32-task concurrent submission stress
[#252 2.5] async::create / open round-trip; h5::threads{N} chain; copy semantics share executor
[#252] HDF5 1.10.9 FAPL-strip regression documentation

Local validation:

  • C++17 build: 51/51 ctest green.
  • TSAN build (-fsanitize=thread): test-h5async 18/18 clean, no races.

CI will run the full matrix (asan/ubsan/tsan/coverage + Ubuntu 22/24 × gcc-13/14/clang-17..20 + macOS-15 + Windows MSVC) once this draft is flipped to ready.

Out of scope (lands in PR-B)

  • Concept-constrained overloads of h5::write / read / append / flush / create(fd, "ds", …) that branch via if constexpr (is_async_v<FD>).
  • Mode-transitive factory pattern (async fd → async ds → async at).
  • h5::pt_t as a class template template <class DS = h5::ds_t> deduced via CTAD.
  • Performance benchmarks vs. HDF5 --enable-threadsafe.

Coupling

…nc_v trait

Refactor the previously-unused `false,false` specialization of
`impl::detail::hid_t<>` so it stands on its own (no private
inheritance) and `= delete`s `operator ::hid_t()` directly.  This
gives a far better diagnostic than the old private-inheritance hide:

    error: use of deleted function
    'h5::impl::detail::hid_t<h5::impl::fd_t, H5Fclose, false, false, 0>
     ::operator hid_t() const'

The previous hid_t<…,false,false,…> only had an initializer-list
ctor (everything else was inherited privately, so default ctor and
the from-::hid_t ctor were inaccessible).  The new spec mirrors the
classic true,true layout: default ctor, ::hid_t-conversion ctor,
init-list ctor, copy/move ctor + assign, dtor.  The `handle` field
is public — internal h5cpp code (executor, dispatch lambdas)
reads it without invoking the deleted conversion; user code is
expected to route through h5::write / h5::read / h5::async::*
factories.

Adds parallel `false,false` specs for hdf5::dataset and
hdf5::attribute so async ds_t / at_t / gr_t pick up the same
shape as their classic counterparts (dapl field, [] operator,
attribute = operator).

User-facing aliases land in a new `h5::async` namespace, parallel
to the classic h5:: descriptor surface:

    namespace h5::async {
        using fd_t = impl::async_hid_t<impl::fd_t, H5Fclose>;
        using ds_t = impl::async_did_t<impl::ds_t, H5Dclose>;
        using at_t = impl::async_aid_t<impl::at_t, H5Aclose>;
        using gr_t = impl::async_aid_t<impl::gr_t, H5Gclose>;
        using ob_t = impl::async_hid_t<impl::ob_t, H5Oclose>;
    }

Plus `h5::is_async_v<T>` — a trait specialized on the false,false
hid_t shape — for the concept-constrained operation overloads
that land in Phase II PR-B.

Out of scope here:
  - h5::async::create / open factories (Phase II 2.2)
  - executor FAPL property + executor thread (Phase II 2.2 / 2.3)
  - operation overloads that branch on is_async_v<FD>  (Phase II PR-B)

Verified locally:
  - 50/50 ctest green (classic surface untouched)
  - Standalone smoke test: async types are constructible, handle
    field accessible, is_async_v<async::*> = true, is_async_v<
    classic *> = false
  - Standalone fail test: `static_cast<::hid_t>(async_fd)` emits
    the deleted-function diagnostic shown above (no other error
    spam)
…ies + scaffold tests

Builds on the descriptor refactor from 8304dea (Phase II 2.1).
Three new headers wire the executor scaffold and one new test file
exercises lifecycle + race coverage.

# Mechanism

  h5cpp/H5executor.hpp
  ─────────────────────
  `impl::executor_t` — single worker thread per async fd.  Same
  std::mutex + std::queue + doorbell + stoppable_thread_t pattern as
  #250's worker_pool_t; the only structural differences are (a) one
  worker instead of N and (b) `submit_and_wait<Fn>(Fn&&)` that blocks
  the caller on a std::packaged_task future returned by the lambda.

  Exception inside the submitted callable propagates back to the
  submitter via `future.get()`.  Reentrant safety: if the caller is
  already on the executor thread, the callable runs inline (the
  same-thread check is gated on `worker_thread_id_` populated by
  the loop on entry).

  h5cpp/H5Pfapl_async.hpp
  ───────────────────────
  Defensive utility — installs an executor slot on a FAPL using the
  same shared_ptr-in-H5Pinsert2 pattern from #250's worker pool.
  Kept for any future code path that *can* round-trip the property
  through a FAPL (HDF5 v2 native async APIs, in-process FAPL
  inspection, etc.) but not on the primary read path — see below.

  h5cpp/H5async.hpp
  ─────────────────
  User-facing `h5::async::create` and `h5::async::open`.  These are
  the only call sites where the word "async" appears.  Each one:

    1. resolves or installs a worker_pool_t for compression,
    2. constructs an `executor_t` with that pool,
    3. calls H5Fcreate / H5Fopen on the user's FAPL,
    4. returns `h5::async::fd_t{raw_hid, std::move(exec)}`.

  Operation overloads (Phase II PR-B) reach the executor via the
  wrapper's `exec` field, NOT via H5Fget_access_plist on the file
  id — see the regression test below for the HDF5 1.10.9 reason.

# HDF5 limitation that drove the executor-on-wrapper design

  HDF5 1.10.9's `H5Fget_access_plist` returns a synthetic FAPL
  reconstructed from standard properties only.  Properties installed
  via `H5Pinsert2` (which is what #250's worker pool slot uses) do
  NOT survive the round-trip.  Verified with both Phase I's
  `h5cpp_fapl_worker_pool` and Phase II's `h5cpp_fapl_executor`
  property names — both come back missing.

  Test `[#252] HDF5 strips user-inserted FAPL properties on
  H5Fget_access_plist` documents this behavior so future code does
  not regress to the slot-only pattern.

  (This also means Phase I's `pt_t::init`, `h5::write`, `h5::read`
  pool resolution via `H5Fget_access_plist` silently falls back to
  basic_pipeline_t — the round-trip tests still pass on correctness
  but the parallel-compression path is never exercised.  That's a
  follow-up issue, not in this PR's scope.)

# Refactor on H5Iall.hpp

  The Phase II 2.1 false,false specialization gains a
  `std::shared_ptr<h5::impl::executor_t> exec` field.  shared_ptr
  with a type-erased deleter lets the fwd-declared executor_t in
  H5Iall.hpp be sufficient — the complete type is only needed at
  make_shared time inside H5async.hpp.

  Copy/move ctors and assignment operators carry the exec pointer
  along with the handle, so `h5::async::fd_t fd_b = fd_a;` shares
  one executor (verified by test).

  New factory ctor `hid_t(::hid_t, std::shared_ptr<executor_t>)`
  is the canonical construction path used by h5::async::create /
  open and by mode-transitive factories in Phase II PR-B.

# Tests — test/H5async.cpp

  18 cases covering:

    [#252] traits — is_async_v<> classification, default-construct
    [#252] static_assert — async types are not is_constructible_v
           <::hid_t, async_t> while classic types still are
    [#252 2.2] FAPL executor slot install / resolve / H5Pcopy share
    [#252 2.3] submit_and_wait — basic, executor-thread-id, void
           return type, exception propagation, reentry inline
    [#252 2.4] 8-thread concurrent submission stress (TSAN signal)
    [#252 2.5] async::create / open round-trip; h5::threads{N}
           chain; shared_ptr propagation across copy
    [#252]    HDF5 1.10.9 FAPL-strip regression documentation

  Local C++17 build: 51/51 ctest green.
  Local TSAN build:  test-h5async 18/18 with no races.
…r async scaffold

README v1.12 highlights gains a one-line bullet for the async-mode
scaffold: `h5::async::fd_t fd = h5::async::create(...)` with the
deleted ::hid_t conversion as the safety mechanism.

docs/filtering-pipeline-rework-report.md gains a "Status — Phase II
PR-A (#252, async descriptors + executor scaffold)" section covering
the namespace shape, the mechanism (executor on wrapper, not on
FAPL slot — with the HDF5 1.10.9 H5Fget_access_plist limitation
called out), and what's deferred to PR-B.
…ait, not worker_loop

CI coverage job (gcov-instrumented, slower than release builds)
hit a benign race in test/H5async.cpp:130 —

    CHECK( exec.in_flight() == 0 )   // after exception case

Race timeline:
  - submit_and_wait() did in_flight_++ at submission
  - worker_loop pulled task, ran it (or caught exception),
    then did in_flight_-- AFTER setting the future's value
  - submitter's fut.get() unblocked the moment the future was
    set, BEFORE the worker reached its fetch_sub
  - the in_flight() check on the submitter's thread could see
    1 instead of 0 if scheduled between fut.get() returning
    and the worker's fetch_sub

Local release builds (and previous CI on TSAN / ASAN / UBSAN)
were fast enough that the worker's fetch_sub almost always
won the race; the coverage job's instrumentation widened the
window enough to expose it deterministically.

Fix: move the decrement from worker_loop to submit_and_wait's
return path via an RAII guard.  After the change, in_flight_
is causally tied to the lifetime of `submit_and_wait` on the
submitter's thread, NOT to "task has been dequeued and run by
the worker".  Reads of in_flight() immediately after the call
now see 0 deterministically — no spin, no wait_idle, no race.

Mechanism:

  template <typename Fn>
  auto submit_and_wait(Fn&& fn) {
      ...
      in_flight_.fetch_add(1, ...);
      { lock_guard ... ; tasks_.emplace([task] { (*task)(); }); }
      bell_.ring();

      struct decrement_on_exit {
          std::atomic<int>& counter;
          ~decrement_on_exit() { counter.fetch_sub(1, ...); }
      } guard{in_flight_};

      return fut.get();   // blocks; rethrows on exception
  }

The RAII guard handles both normal return and exception paths
identically — the future's value (or exception) propagates,
then ~guard runs, then the caller observes both the return /
throw AND the post-decrement state of in_flight_.

worker_loop's fetch_sub is removed; the worker now just runs
the task and continues.  wait_idle() semantics are unchanged
because submit_and_wait blocks until fut.get() returns —
in_flight_ stays at the same monotone count it would have had
with the old design.

Verified:
  - Local release: 18/18 h5async ctest green
  - Local TSAN:    18/18 h5async ctest green, no new races
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 96.19048% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
h5cpp/H5Iall.hpp 85.00% 3 Missing ⚠️
h5cpp/H5executor.hpp 97.72% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@steven-varga steven-varga marked this pull request as ready for review May 18, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant