Skip to content

[#254]:svarga:refactor, remove compiler_meta_t and field_descriptor_t from type engine#255

Open
steven-varga wants to merge 7 commits into
releasefrom
254-refactor-remove-compiler-meta
Open

[#254]:svarga:refactor, remove compiler_meta_t and field_descriptor_t from type engine#255
steven-varga wants to merge 7 commits into
releasefrom
254-refactor-remove-compiler-meta

Conversation

@steven-varga
Copy link
Copy Markdown
Collaborator

Removes compiler_meta_t<T>, field_descriptor_t<owner,field>,
is_reflected_compound_t<T>, and metadata_version from H5Tmeta.hpp.

These were plugin-output scaffolding with no independent value: the only
consumer was the h5cpp-compiler plugin (no released users), C++26 reflection
bypasses them entirely in favour of a direct storage_traits_impl_t
partial specialisation, and the old dt_t / h5::create<T>() path is
cleaner for the manual-registration case that will always need an escape hatch.

Zero tests exercised these types. All dependent specialisations
(storage_traits_impl_t, is_transport_contiguous_impl_t,
access_traits_t for is_reflected_compound_t<T>) are removed with them.
The three !is_reflected_compound_t<T>::value no-op guards on the
contiguous-sequence access_traits_t specialisations are also dropped.

What survives unchanged: storage_traits_impl_t backbone, is_transport_contiguous_t,
access_traits_t, resolved_type_t fallback to dt_t<T>.

The C++26 auto-reflection path lands directly into storage_traits_impl_t
in a later PR with no dependency on the removed types.

The GitHub releases download counter is low-signal for a header-only
library — most consumers vendor h5cpp via git submodule or copy the
header tree directly, so the counter measures only the subset who pull
.deb / .rpm / .pkg / .exe assets from Releases. Recent release-asset
issues (v1.12.4 needed manual upload after the Package workflow failed
on macOS and Windows) make any short-term number further unreliable.

Removed:
[![Downloads](https://img.shields.io/github/downloads/vargalabs/h5cpp/total)](https://github.com/vargalabs/h5cpp/releases)

The remaining badges (CI, ASan, UBSan, codecov, MIT, DOI, GitHub
release, Documentation) carry meaningful signals and stay.
…filename

CMake project() was hardcoded to VERSION 1.10.4.6, so CPack-produced
artifacts on the v1.12.4 release carried the stale 1.10.4 string in
their filenames despite the actual release tag. The v1.12.4 release
required manual rename + upload of the Linux artifacts as a result.

This commit:

1. Derives PROJECT_VERSION from the latest git tag matching v[0-9]*
   before the project() call, so CPack picks it up correctly:

     find_package(Git QUIET)
     execute_process(COMMAND git describe --tags --abbrev=0 ...)
     if(H5CPP_GIT_TAG MATCHES "^v([0-9]+(\\.[0-9]+)*)$")
         set(H5CPP_PROJECT_VERSION "${CMAKE_MATCH_1}")
     endif()
     project(libh5cpp-dev VERSION ${H5CPP_PROJECT_VERSION} ...)

   Falls back to "1.12.0" placeholder for tarball builds without git
   history, and for branches where the latest reachable tag doesn't
   match the strict vX.Y.Z[.W] pattern (e.g. staging sees v1.10.4-6
   which is filtered out by the regex).

2. Decouples H5CPP_HDF5_FLOOR from PROJECT_VERSION. The previous check
   compared HDF5_VERSION against PROJECT_VERSION, which coupled package
   versioning to HDF5 minimum requirements purely by coincidence —
   1.10.4.6 happened to be both the h5cpp release tag AND the HDF5
   minimum. After #234 raised the floor to 1.12 and PROJECT_VERSION
   now tracks the git tag, the comparison is no longer meaningful.
   Pinned to "1.12.0" explicitly.

3. Sets CPACK_PACKAGE_FILE_NAME to include CMAKE_SYSTEM_NAME and
   CMAKE_SYSTEM_PROCESSOR (lowercased), so amd64 / arm64 / x86_64 /
   aarch64 / darwin-arm64 / windows-amd64 produce distinct filenames
   when collected into the Publish job's flat dist/ directory:

     h5cpp-dev-v1.12.5-linux-x86_64.{deb,rpm}
     h5cpp-dev-v1.12.5-linux-aarch64.{deb,rpm}
     h5cpp-dev-v1.12.5-darwin-arm64.pkg
     h5cpp-dev-v1.12.5-windows-amd64.exe

   Without this override, both amd64 and arm64 Linux builds emit
   identical "h5cpp-dev-1.10.4-Linux.{deb,rpm}" names and overwrite
   each other on the Release page.
…rom-source

Two platform fixes in the same workflow file, both required for the
v1.12.5 Package run to produce assets across the full matrix.

macOS (#245):
  The Configure step set HDF5_ROOT from brew --prefix but FindHDF5 on
  macos-15 / Apple Silicon needed CMAKE_PREFIX_PATH as well to locate
  the C library, producing:

    Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS C)

  ci.yml's macOS path sets both; package.yml only set one. Symmetrised.

Windows (#246):
  choco install hdf5 returns "package not found" — there is no hdf5
  package in the Chocolatey community repo. The #236 fix that
  introduced this command was based on an incorrect assumption.

  Replaced with the build-from-source pattern already validated by
  ci.yml's Windows path:

    - Build zlib 1.3.1 from source (~30s, not cached)
    - Restore HDF5 1.12.2 cache if present
    - Build HDF5 1.12.2 from source if cache-miss (~3-5min, cached)
    - Save HDF5 cache on miss

  Configure now consumes both prefixes through CMAKE_PREFIX_PATH.

  The first run after this change will be slow (~5min for HDF5 build);
  subsequent runs reuse the cache.
Phase 1.1 of the FAPL multithreading workplan (tasks/h5cpp-fapl-multi
threading-workplan.md §3).  Lifecycle infrastructure only — the real
pool internals (queues, compress dispatch) come in Phase 1.2, and
consumer-site wiring (pt_t, h5::write, h5::read) in Phase 1.3.

What this commit lands:

  h5cpp/H5worker_pool.hpp
    Minimal worker_pool_t.  Owns N std::jthreads sized by the
    constructor argument (0 → hardware_concurrency()).  Phase 1.1
    workers park on stop_token; Phase 1.2 will replace the placeholder
    loop with the compress dispatch loop reading from bounded MPMC
    queues.  Non-copyable, non-movable (jthread members + future
    atomic in_flight).

  h5cpp/H5Pfapl_threads.hpp
    The hidden-pointer pattern, shared-ownership variant.  Stores a
    pointer to a heap-allocated worker_pool_slot_t in the FAPL skip
    list via H5Pinsert2.  The slot owns a shared_ptr<worker_pool_t>;
    the copy callback allocates a new slot whose shared_ptr aliases
    the same pool — every FAPL copy increments the refcount instead of
    spawning a fresh pool.  Close callback drops one slot; pool dies
    when the last slot is freed.

    Defines h5::threads as a user-facing FAPL property:
        h5::fd_t fd = h5::create("data.h5", H5F_ACC_TRUNC, h5::threads{8});
        h5::fd_t fd = h5::create("data.h5", H5F_ACC_TRUNC, h5::threads{});

    h5::impl::resolve_worker_pool(fapl_id) is the consumer-site helper
    that Phase 1.3 will use in pt_t / h5::write / h5::read to look up
    the pool from the file's FAPL.

  h5cpp/core
    Adds H5worker_pool.hpp and H5Pfapl_threads.hpp to the include
    chain right after H5Pdapl.hpp.

Tests (test/H5Pall.cpp, [#250] cases):

  - Regression scaffold: install the property WITHOUT the copy callback
    (the broken state) and assert the tracking close callback detects
    the double-free.  Guards against silent test breakage where the
    real assertion becomes a no-op.  Same idea as #244's scaffolding.

  - Single FAPL clean lifecycle: alloc=del, no leak.

  - H5Pcopy preserves shared pool ownership: pool pointer is IDENTICAL
    across two FAPLs that share the property; closing one FAPL doesn't
    affect the other; refcounts balance.

  - h5::threads tag installs the property: exercises the real
    fapl_threads_set callback (not the tracking shim) and verifies
    h5::impl::resolve_worker_pool returns the pool with the requested
    worker count.

  - h5::threads{} (no count) uses hardware_concurrency.

Validated locally via a standalone TU compiled against the new headers
with HDF5 1.10.9; the slot pattern only requires H5Pinsert2 (HDF5 ≥ 1.8)
so it works regardless of the v1.12 floor.  CI matrix on this PR will
run the ctest cases against HDF5 1.12.2.

[#250]:svarga:refactor, consolidate worker pool + FAPL property into H5Pthreads.hpp

Renames + merges the two Phase 1.1 headers (h5cpp/H5worker_pool.hpp and
h5cpp/H5Pfapl_threads.hpp) into a single h5cpp/H5Pthreads.hpp.  Mirrors
the H5Pdapl.hpp precedent of having the property type, lifecycle
callbacks, and any helper data structures co-located in one header per
property family.

Contents unchanged — same worker_pool_t, same worker_pool_slot_t, same
fapl_pool_copy_cb / fapl_pool_close_cb / fapl_threads_set, same
resolve_worker_pool, same h5::threads tag.  Only the file layout
changes; no API change for users or for the test suite.

Updates the include chain in h5cpp/core accordingly.

[#250]:svarga:fix, lower H5CPP_HDF5_FLOOR from 1.12.0 to 1.10.4 — CI matrix coverage

#247 introduced H5CPP_HDF5_FLOOR = "1.12.0" alongside decoupling the
floor from PROJECT_VERSION.  Choice of 1.12.0 was too aggressive — the
CI matrix runs Ubuntu 22.04 with system HDF5 1.10.7 (floor coverage
restored by #235), and every Linux matrix entry that consumes the
system package fails at configure with:

    -- !!! H5CPP requires HDF5 v1.12.0 or greater (found 1.10.7) !!!

This change lowers the floor to 1.10.4, matching the prior implicit
floor that the stale project(VERSION 1.10.4.6) line encoded before
#247 removed it.  Result: Ubuntu 22.04's 1.10.7 system package
satisfies the check; all matrix entries that previously passed
continue to pass.

When the project eventually drops 1.10.x coverage, bump this constant
deliberately AND remove the 22.04 matrix entry in the same commit, so
the floor and the matrix stay in sync.

Validated locally with HDF5 1.10.9: cmake configure succeeds, full
ctest sweep (50/50) green including the new [#250] FAPL slot-lifecycle
regression cases.

[#250]:svarga:fix, H5Pthreads.hpp use stoppable_thread_t shim for C++17 compatibility

Examples target compiles with C++17 (matching the project's documented
minimum standard), but H5Pthreads.hpp used std::jthread / std::stop_token
unconditionally — both C++20 features.  Every example TU including
h5cpp/core failed:

    H5Pthreads.hpp:91: error: 'std::jthread' is only available from C++20 onwards
    H5Pthreads.hpp:85: error: 'std::stop_token' has not been declared

H5Zpipeline_threaded.hpp already addresses this — it uses
h5::detail::stoppable_thread_t and h5::detail::stop_token_t from
detail/stoppable_thread.hpp, which are aliases for std::jthread /
std::stop_token under C++20 and hand-rolled shims with the same API
under C++17.

This commit switches H5Pthreads.hpp to the same shim, mirroring the
H5Zpipeline_threaded.hpp pattern.  No API change for users; the
h5::threads tag, worker_pool_t, and slot lifecycle are unaffected.

Local ctest sweep: 50/50 green with system HDF5 1.10.9.

[#250]:svarga:feature, worker_pool_t generic submit() + wait_idle() API

Phase 1.2 of the FAPL multithreading workplan.  Replaces the
Phase 1.1 placeholder worker loop with a real submit-based pool.

Design:

  - Workers pull type-erased tasks (std::function<void()>) off a single
    queue protected by std::mutex + std::queue + h5::detail::doorbell_t.
    Same primitives as H5Zpipeline_threaded.hpp's C++17 fallback path,
    so the pool works uniformly on C++17 (via stoppable_thread_t shim)
    and C++20 (via std::jthread + atomic::wait under the doorbell).

  - submit<F>(callable) returns std::future<invoke_result_t<F>>.
    Internally uses std::packaged_task wrapped in a shared_ptr so the
    move-only task can sit inside a copyable std::function in the queue.

  - wait_idle() blocks until in_flight reaches 0.

  - dtor calls wait_idle() before letting stoppable_thread_t destructors
    request stop + join.  No work is lost on shutdown.

  - Pool is HDF5-agnostic.  Phase 1.3 will wrap HDF5-specific compress
    logic in closures and submit() them.

Tests (test/H5Pall.cpp, [#250 1.2] cases):

  - single submit + future return value
  - many submits resolve in parallel (256 squared-int sums)
  - multi-thread submission is safe (4 producers × 100 tasks)
  - wait_idle blocks until completion (50 tasks)
  - exception in task captured in future
  - dtor drains pending work cleanly (no work lost)

Local sweep: 50/50 ctest green including the new 6 [#250 1.2] cases.

[#250]:svarga:refactor, simplify worker_pool_t worker_loop double-check

The original worker_loop used a double-check pattern:

  1. lock m_, see empty, read bell_.load() → last
  2. unlock m_
  3. re-lock m_, check tasks_ again (for race window)
  4. if still empty, wait(last)

Steps 3-4's re-check is redundant.  submit() rings the bell ONLY after
releasing the lock that the worker is contending for.  The doorbell
sequence increments before notify_one().  So between the worker's
single lock-release in step 2 and its wait(last) call, any submit() that
pushed a task will have rung the bell — bell_.wait(last) sees a
sequence > last and returns immediately.  No missed wakeup possible.

This refactor collapses the pattern to a single critical section that
either pops or snapshots the doorbell, then waits outside the lock if
no task was popped.  Same correctness, easier for sanitizers and
future readers to verify.

Local ctest still 50/50.

[#250]:svarga:fix, coverage CI add -fprofile-update=atomic for thread-safe gcov

The coverage job (ubuntu-24.04 / gcc-14) failed at the "Capture coverage"
step with:

    geninfo: ERROR: Unexpected negative count '-3' for
    thirdparty/libdeflate/v1.25.0/lib/deflate_compress.c:2217.
    Perhaps you need to compile with '-fprofile-update=atomic

Phase 1.2's worker_pool_t spawns std::jthread workers that execute user
tasks (test cases submit 256+ tasks across 4-8 workers).  Existing tests
ALSO exercise multi-threaded libdeflate via #241's threaded_pipeline_t.
The combined concurrency saturates gcov's non-atomic counter updates,
producing the negative-count corruption above.

Fix: add -fprofile-update=atomic to the coverage build's C and C++ flags.
This makes gcov instrumentation increment counters atomically.  Marginal
runtime cost for the coverage build only; production builds are unaffected.

This is a coverage-infrastructure fix, not Phase 1.2 logic — but it sits
on the Phase I branch because Phase 1.2's threading is what surfaced it.

[#250]:svarga:fix, Phase 1.2 tests wait_idle before in_flight() check

Coverage CI (gcc-14 / ubuntu-24.04, Debug -O0 + --coverage instrumentation)
failed in test-h5pall:

    test/H5Pall.cpp:387: ERROR: CHECK( pool.in_flight() == 0 ) is NOT correct!
      values: CHECK( 1 == 0 )

Race in the test, not in the pool:

  - submit() increments in_flight_ before queueing the task.
  - worker_loop runs (*task)(), which calls packaged_task::set_value
    internally — that's when future::wait()/get() unblocks.
  - worker_loop then decrements in_flight_ AFTER task() returns.

There's a small window between "future is ready" and "in_flight is
decremented" — workers can be in the gap.  Optimized builds usually
close it within nanoseconds; coverage's -O0 + atomic gcov makes it
wide enough to observe.

Fix: add pool.wait_idle() before each in_flight() == 0 assertion.
wait_idle() spins until in_flight reaches 0, eliminating the race.

The pool's semantics are unchanged.  in_flight() remains an approximate
introspection metric that can transiently exceed the count of pending
futures.  Tests that need a precise zero should call wait_idle() first.

Local sweep: 50/50 ctest green; the regression test for [#250 1.2] now
holds under Debug + --coverage flags.

[#250]:svarga:feature, Phase 1.3.1 h5::backpressure FAPL property

Adds a separate FAPL property h5::backpressure{N} that bounds the
in-flight chunk deque for any pt_t / h5::write / h5::read that uses
the file's worker pool.  Composes with h5::threads via operator|:

    h5::fd_t fd = h5::create("data.h5", H5F_ACC_TRUNC,
                             h5::threads{8} | h5::backpressure{32});

Storage:

  - New H5CPP_FAPL_BACKPRESSURE property, stores a plain unsigned via
    H5Pinsert2 with null lifecycle callbacks (POD value, memcpy
    semantics are correct for HDF5's internal copy + close).

  - fapl_backpressure_set is idempotent (skips if already installed),
    matching fapl_threads_set behavior.

  - resolve_backpressure(fapl_id, worker_count) returns:
      * user-set value when h5::backpressure{N} was applied,
      * else H5CPP_FAPL_BACKPRESSURE_DEFAULT_FACTOR × worker_count
        (default factor = 8, override via compile-time macro).

h5::backpressure{N} without h5::threads{N} is silently a no-op at the
consumer level — without a pool, there's no queue to bound.  Resolver
still returns the user value if the property is set; pt_t / h5::write
ignore it when no pool is present.

Tests (test/H5Pall.cpp, [#250 1.3] cases):

  - h5::backpressure tag installs the property via property chain
  - Default = 8 × worker_count when unset
  - Property survives H5Pcopy (POD memcpy semantics)
  - backpressure-only FAPL: resolver returns user value; no pool
    is installed (verified separately)

Local: 50/50 ctest green.

Phase 1.3.2 (next): wire pt_t to read both pool + backpressure cap
from the FAPL and enforce the in-flight deque bound.

[#250]:svarga:feature, Phase 1.3.2 pt_t resolves FAPL pool + backpressure cap at init

Wires pt_t into the FAPL worker-pool plumbing without changing dispatch
behavior yet.  After this commit:

  - pt_t has two new private members:
      std::shared_ptr<impl::worker_pool_t> pool_;
      unsigned                             backpressure_cap_;

  - pt_t::init() calls H5Fget_access_plist on the file id (already held
    for re-opening the dataset with zero-chunk-cache DAPL), then
    h5::impl::resolve_worker_pool(fapl) + h5::impl::resolve_backpressure
    to populate the members.  fapl is closed before the existing
    H5Fclose(fid).

  - pt_t move-assignment moves pool_ and copies/resets backpressure_cap_.

  - pt_t copy-constructor (deep copy from another pt_t's ds) leaves both
    members at their default state — copy of pt_t was always documented
    as the synchronous-pipeline path, that behavior is unchanged.

Dispatch (1.3.2 step 2, next commit) will:
  - Branch on pool_ being non-null at write_chunk call sites.
  - Submit compress closures to pool_, push the returned futures into
    an in-flight deque, drain in order, block on full deque using
    backpressure_cap_.

This commit keeps the existing visit_pipeline() dispatch in place so
the change is observable only by the new tests below.  All 50 pre-
existing ctest cases continue to pass bytewise-identically.

Tests (test/H5Dappend.cpp, [#250 1.3.2] cases):

  - pt_t constructed against a file whose FAPL has h5::threads{4} |
    h5::backpressure{16}: the FAPL resolves to a 4-worker pool with
    cap=16 (verified via the same FAPL's resolve_worker_pool and
    resolve_backpressure helpers).
  - pt_t with no FAPL pool falls back cleanly (default FAPL): construct,
    append 32 ints, flush, read back identical bytes.

[#250]:svarga:feature, Phase 1.3.2 step 2 — pt_t pool dispatch + async-pipelined drain

When pt_t resolves a worker_pool_t from the file's FAPL, every chunk
commit now flows through write_chunk_via_pool instead of the variant
pipeline.  This is Option B from the workplan §3 design notes:
async-pipelined producer with FIFO drain on the calling thread.

Mechanism:

  - write_chunk_via_pool snapshots the filter chain config (filter[],
    flags[], cd_size[], cd_values[], tail) from pt_t's basic_pipeline
    member into a POD filter_chain_t for cheap by-value capture.

  - The raw chunk bytes are copied into a unique_ptr<byte[]> the
    worker will own.  Per-chunk allocation; thread-local scratch in
    the pool is a follow-on optimization if benchmarks justify it.

  - Closure submitted to pool: applies the filter chain ping-pong
    style (same logic as basic_pipeline_t::write_chunk_impl), copies
    final bytes into a fresh owned aligned buffer, returns
    pool_result_t{data, nbytes, mask, offset}.

  - The returned future is pushed onto in_flight_ (deque preserves
    submission order).

  - drain_in_flight(false) drains all already-completed futures off
    the front, calling H5Dwrite_chunk on the calling thread for each.
    HDF5 stays single-threaded.

  - When in_flight_.size() reaches backpressure_cap_, write_chunk_via_pool
    blocks on the front future via drain_in_flight(true).  Producer
    memory is bounded by cap × chunk_bytes per pt_t.

All six visit_pipeline → write_chunk call sites in this file now
route through dispatch_chunk(), a tiny private method that picks
pool path when pool_ is non-null and falls back to visit_pipeline
otherwise.  Synchronous behavior is bytewise-unchanged when no
FAPL pool is configured.

flush() now drains in_flight_ to completion before returning.

Tests (test/H5Dappend.cpp, [#250 1.3.2] cases):

  - gzip round-trip equivalence: same data written through synchronous,
    pool-default-cap, and pool-explicit-cap paths produces identical
    readback in all three cases.
  - tight back-pressure (cap=2) round-trip: forces the producer-blocking
    branch every other chunk; data integrity preserved.

Local: 50/50 ctest green.

What this commit does NOT do (deferred):

  - h5::write and h5::read pool integration (Phase 1.3.3 — same approach
    but via h5::get_access_plist(ds) at the call site).
  - Removal of #241's pt_t(ds, h5::filter::threads{N}) constructor
    (Phase 1.4 — once we're sure the FAPL path covers everything).
  - Thread-local scratch buffers in the pool (optimization, post-Phase I).

[#250]:svarga:refactor, Phase 1.3.3 part 1 — pool_pipeline_t as third variant alternative

Implements workplan Option b: the FAPL worker pool becomes a third CRTP
descendant of pipeline_t<Derived>, slotting into pt_t's pipeline variant
alongside basic_pipeline_t and threaded_pipeline_t.

New header h5cpp/H5Zpipeline_pool.hpp:

  - pool_pipeline_t owns std::shared_ptr<worker_pool_t> + back-pressure
    cap + std::deque<std::future<result_t>> in-flight queue.

  - write_chunk_impl snapshots filter chain config into a POD captured
    by value, copies raw chunk into a unique_ptr<byte[]> owned by the
    worker, submits the compress closure to the pool, pushes the
    future onto in_flight_, drains opportunistically, blocks on the
    front future when cap is reached.

  - read_chunk_impl stays synchronous (identical to basic).  Parallel
    decompression is Phase 1.5+ work — requires read-ahead that
    cooperates with pipeline_t<>::split_to_chunk_read.

  - public drain() method walks in_flight_ to completion, called by
    pt_t::flush so users get "data on disk after flush returns".

  - Destructor drains any remaining in-flight in submission order.

pt_t refactor:

  - pt_pipeline_t variant now has three alternatives: basic, threaded
    (legacy #241), pool.
  - init() resolves the FAPL pool; if present, emplaces a pool_pipeline_t
    alternative; otherwise keeps the default basic_pipeline_t.
  - All six write_chunk dispatch sites revert to plain visit_pipeline +
    p.write_chunk — uniform across the three alternatives via the CRTP
    base.
  - dispatch_chunk / write_chunk_via_pool / drain_in_flight / pool_ /
    backpressure_cap_ / in_flight_ all REMOVED from pt_t (the logic
    now lives in pool_pipeline_t).
  - flush() invokes drain() on the pool alternative via std::visit
    type-checked branch.

Why this shape (Option b from workplan §3 design notes):

  - h5::write and h5::read benefit transparently in the next sub-phase
    via the existing pipeline_t<>::write/read → split_to_chunk_write
    → write_chunk_impl machinery.  Just construct a pool_pipeline_t
    in those entry points when the file's FAPL has a pool.
  - The in-flight deque and drain logic live in one place, not
    duplicated across consumers.
  - Phase 1.4 (remove #241's threaded_pipeline_t alternative) becomes
    a single-commit cleanup: the variant goes from three to two
    alternatives.

Local: 50/50 ctest green.  Existing Phase 1.3.2 round-trip tests
continue to pass (the variant routes them through pool_pipeline_t now
instead of pt_t's removed in-flight machinery).

What this commit does NOT do (Phase 1.3.3 part 2, next commit):

  - h5::write / h5::read FAPL pool integration.
  - Will construct a local pool_pipeline_t in those entry points,
    call set_cache + pipeline.write/read, let RAII drain on exit.

[#250]:svarga:feature, Phase 1.3.3 part 2 — h5::write/read consult FAPL pool

When the file's FAPL has h5::threads{N} installed AND the dataset's
DAPL has h5::high_throughput AND the dataset is chunked, h5::write
and h5::read now construct a local pool_pipeline_t and route the
operation through it.  Otherwise the existing paths run unchanged:

  - DAPL high_throughput + no FAPL pool → existing DAPL-stored
    basic_pipeline_t pointer (synchronous filter chain).
  - No DAPL high_throughput → standard H5Dwrite / H5Dread.

Both branches sit inside the existing use_pipeline gate (which also
checks layout == H5D_CHUNKED, the #242 SegFault guard).  The FAPL
pool is the new inner check, taking priority over the DAPL pipeline
pointer when present.

Mechanism (h5::write side):

    hid_t fid  = H5Iget_file_id(static_cast<hid_t>(ds));
    hid_t fapl = H5Fget_access_plist(fid);
    auto pool  = h5::impl::resolve_worker_pool(fapl);
    if (pool) {
        const unsigned cap = h5::impl::resolve_backpressure(
            fapl, pool->worker_count());
        h5::impl::pool_pipeline_t pipe(std::move(pool), cap);
        // set_cache populates the filter chain from the DCPL.
        h5::dcpl_t dcpl{H5Dget_create_plist(static_cast<hid_t>(ds))};
        hid_t type_id  = H5Dget_type(static_cast<hid_t>(ds));
        size_t elem_sz = H5Tget_size(type_id);
        H5Tclose(type_id);
        pipe.set_cache(dcpl, elem_sz);
        pipe.write(ds, offset, stride, block, count, dxpl, ptr);
        // RAII: pipe destructor drains in_flight before pool refcount drop.
    } else {
        // ... existing DAPL pipeline path ...
    }

h5::read uses the same shape; pool_pipeline_t::read_chunk_impl is
currently synchronous (identical to basic_pipeline_t::read_chunk_impl)
so the FAPL-pool read branch is semantically equivalent to the DAPL
read path today.  The structure is in place for the Phase 1.5+
read-ahead optimization to land without touching call sites.

Tests (test/H5Pall.cpp, [#250 1.3.3] cases):

  - h5::write + h5::read round-trip through FAPL pool with
    h5::threads{4} | h5::backpressure{16} + DAPL high_throughput +
    gzip-compressed dataset.  Reader uses default FAPL — verifies
    data is durably on disk regardless of reader-side FAPL choice.

  - h5::write without high_throughput on the DAPL bypasses the pool
    even when the FAPL has one installed.  Confirms per-dataset
    opt-in still gates the pool path.

Local: 50/50 ctest green.

What this commit does NOT do (Phase 1.4, next commit):

  - Remove pt_t(ds, h5::filter::threads{N}) constructor from #241.
  - The threaded_pipeline_t variant alternative goes away with it.

[#250]:svarga:refactor, Phase 1.4 — retire #241 h5::filter::threads + threaded_pipeline_t

Migration option B (chosen earlier in #250): the FAPL-scoped pool from
Phases 1.1–1.3 fully subsumes the per-pt_t worker pool added in #241.
Two parallel paths invite contention bugs and confuse the surface; one
codepath survives.

Removed:

  - h5cpp/H5Zpipeline_threaded.hpp (deleted).  This carried
    threaded_pipeline_t — both the C++20 (jthread + lock-free queue)
    and C++17 (stoppable_thread + std::mutex + std::queue) variants —
    plus the h5::filter::threads tag.

  - h5::pt_t( const h5::ds_t&, h5::filter::threads ).  The remaining
    pt_t( const h5::ds_t& ) ctor now resolves the file's FAPL and
    swaps to pool_pipeline_t when h5::threads{N} is installed.

  - impl::pt_pipeline_t third alternative
    (unique_ptr<threaded_pipeline_t>).  The variant is now
    { basic_pipeline_t, pool_pipeline_t } — both unique_ptr-wrapped
    for move-assignment.

  - core: #include "H5Zpipeline_threaded.hpp" line removed.  Stale
    "threaded_pipeline_t is defined in ..." comment in
    H5Zpipeline.hpp also removed.

Tests:

  - test/H5Zpipeline.cpp: dropped the three direct
    threaded_pipeline_t round-trip cases (no filter / gzip /
    shuffle+gzip).  Pool round-trip is covered in test/H5Pall.cpp
    via the [#250 1.3.3] cases.

  - test/H5Dappend.cpp: dropped the four [#241] cases (tag
    construction, no-filter round-trip, basic-vs-threaded bytewise
    equivalence, default hw_concurrency worker count).  Replaced
    with a one-line comment pointing to the FAPL coverage in
    test/H5Pall.cpp.

User-visible API change (intentional, since v1.12.4 of #241 was the
only release that shipped it):

    // before (#241):
    h5::pt_t pt(ds, h5::filter::threads{4});

    // now (#250):
    h5::fd_t fd = h5::create(path, H5F_ACC_TRUNC,
                             h5::threads{4} | h5::backpressure{16});
    // ... dataset created on fd inherits the pool ...
    h5::pt_t pt(ds);   // resolves fd's pool automatically

Verified locally: 50/50 ctest green.  CI will validate across the
asan / ubsan / coverage / macos / windows matrix on push.

[#250]:svarga:test, Phase 1.5 — TSAN job + [#250 1.5] FAPL pool race coverage

CI: add a `tsan / ubuntu-24.04 / clang-20` job mirroring the existing
asan / ubsan jobs.  It configures clang-20 with
`-fsanitize=thread -fno-omit-frame-pointer -O1 -g`, builds the whole
tree, and runs the full ctest suite under
`TSAN_OPTIONS=halt_on_error=1:second_deadlock_stack=1`.  Failing
races now break CI loud the same way ASan/UBSan errors do.  Badge
generator's `needs:` list extended to include the new job so the
TSan status SVG renders alongside the others.

Tests: four new cases tagged `[#250 1.5]` in test/H5Pall.cpp, sized
to exercise the FAPL pool path the way race-prone bugs would
surface:

  - "two FAPLs with independent pools — bytewise equivalence to
    basic": same payload written via basic pipeline + 4-thread pool +
    2-thread pool; all three readbacks must equal the input.
    Catches any state leakage between pool instances or between the
    pool and basic paths.

  - "interleaved writes through two FAPLs do not cross-contaminate":
    two open fds with independent pools alive at the same time,
    distinct payloads written sequentially (HDF5 itself isn't thread-
    safe in default builds, so concurrent H5 C-API calls are out of
    scope; the pool only parallelizes compression).  Chunks routed
    through one pool must not appear in the other file.

  - "pt_t destructor drains pool before fd close — data on disk":
    no explicit h5::flush; lets ~pt_t fire the pool drain.  Reads
    back from a fresh open verify the destructor honored the "data
    on disk after scope exit" contract.

  - "fd close after FAPL pool flush releases workers": opens / writes
    / closes the file four times back-to-back; each iteration drops
    its own shared_ptr<worker_pool_t> via the FAPL slot's close-cb.
    Catches use-after-free in the slot lifecycle.

Local validation (Ubuntu 22.04 host, g++ 11 with `-fsanitize=thread`):

    cmake -B build-tsan -DCMAKE_BUILD_TYPE=Debug \
        -DCMAKE_CXX_FLAGS="-fsanitize=thread -fno-omit-frame-pointer -O1 -g" \
        -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=thread" \
        -DH5CPP_BUILD_TESTS=ON
    cmake --build build-tsan -j
    TSAN_OPTIONS=halt_on_error=1 ./build-tsan/test-h5pall
    TSAN_OPTIONS=halt_on_error=1 ./build-tsan/test-h5dappend
    TSAN_OPTIONS=halt_on_error=1 ./build-tsan/test-h5zpipeline

  → h5pall 30/30, h5dappend 18/18, h5zpipeline 10/10.  No data races.

CI clang-20 will validate the same on the matrix.

[#250]:svarga:docs, Phase 1.6 — README + pipeline rework report updated for FAPL pool

- README v1.12 highlights: replace "Threaded I/O pipeline" bullet
  with the user-visible FAPL pool API
  (`h5::create(..., h5::threads{N} | h5::backpressure{M})`) and
  bump the sanitizer line from "ASan + UBSan" to "ASan + UBSan +
  TSan" now that the TSan job is wired into CI.

- docs/filtering-pipeline-rework-report.md:
  - Update the Threading row of the gap table to mark Phase I
    delivered as `pool_pipeline_t` in #250.
  - Append a "Status — Phase I (#250, FAPL worker pool)" section
    describing the surface, the mechanism (shared_ptr slot in
    FAPL, pool_pipeline_t in h5::write / h5::read / pt_t),
    bounded back-pressure, and the removal of the #241 per-pt_t
    `h5::filter::threads{N}` constructor.

Phase II tracking is deferred — that lives in
tasks/h5cpp-fapl-multithreading-workplan.md.

[#250]:svarga:fix, Windows MSVC h5pall — distinct paths per iteration of fd close test

The Phase 1.5 case "fd close after FAPL pool flush releases workers"
re-created the same file four times in a loop.  On Windows / HDF5
1.12.2 the C runtime briefly holds the previous handle past the
HDF5-level close, so the second `H5Fcreate` raced the deletion of
the first file and threw "couldn't create file" at H5Fcreate.hpp:58.

Replaces the in-place re-create loop with a list of four distinct
paths.  The slot-lifecycle behaviour being asserted — FAPL property
close-cb drops the pool ref when the last fd dies — still fires
once per iteration; only the filesystem reuse is gone.

Linux/macOS handle in-place re-creation fine (POSIX delete is
atomic), so the test ran green on every other CI matrix entry plus
local h5pall (30/30).  This is a Windows-only fragility fix, not a
correctness change in the FAPL pool itself.
@steven-varga steven-varga force-pushed the 254-refactor-remove-compiler-meta branch from c2ad70e to 292b21d Compare May 19, 2026 21:31
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 45.02165% with 127 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
h5cpp/H5Zpipeline_pool.hpp 0.00% 97 Missing ⚠️
h5cpp/H5Dread.hpp 0.00% 17 Missing ⚠️
h5cpp/H5Dwrite.hpp 47.05% 9 Missing ⚠️
h5cpp/H5Dappend.hpp 85.18% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant