TrackedVector by ax3l · Pull Request #5253 · AMReX-Codes/amrex

ax3l · 2026-03-28T23:27:58Z

Summary

This adds a helper class for synchronizing a pair of host & device vectors (w/o using managed memory).

In BLAST codes, I find this to cause significant boilerplate in physics-focused parts of the code and this little helper class cuts down on lines and mental book-keeping.

Additionally, especially in interactive pyAMReX and ImpactX workflows, this construct is a building block for enabling user-friendly and even multi-simulation-spanning user data (e.g., ImpactX element data) by enabling users to define their inputs even before AMReX was initialized and being able to reuse their input classes across many thousands of simulations, e.g., in optimization loops, even with AMReX device arenas being shut down/recreated in between.

Additional background

The unit test shows the most common usage patterns & needs.

BLAST-ImpactX/impactx#1368

Checklist

The proposed changes:

add new capabilities to AMReX
CPU build: write the optimized fallback path
GPU build: use AMReX-session agnostic pinned memory for host OR do synchronous copies
include documentation in the code and/or rst files, if appropriate
is covered by unit tests

WeiqunZhang · 2026-03-28T23:33:53Z

Note that we already have a similar data container called Gpu::Buffer.

Src/Base/AMReX_HostDeviceVector.H

ax3l · 2026-03-28T23:43:00Z

Note that we already have a similar data container called Gpu::Buffer.

Ah that is cool, I forgot about that one.

After checking again Gpu::Buffer, there are subtly different needs in my implementation: In particular, I am not bound to AMReX being initialized to start writing (host) data into it, which is the driving need I have in pyAMReX/ImpactX to have more user-friendly initialization and lifetimes of complex simulation data.

We could extend/rewrite Gpu::Buffer or keep this separate with a distinct enough name & docs.

WeiqunZhang · 2026-03-28T23:46:36Z

We could make Gpu::Buffer template on container type.

Src/Base/AMReX_HostDeviceVector.H

ax3l · 2026-03-29T00:13:10Z

Yes, good idea. Let's finish the tests and impl. for this one and then we can investigate if/how we merge them?

Src/Base/AMReX_HostDeviceVector.H

Tests/Base/HostDeviceVector/main.cpp

ax3l · 2026-03-29T05:55:20Z

/run-hpsf-gitlab-ci

github-actions · 2026-03-29T05:55:30Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498416.

amrex-gitlab-ci-reporter · 2026-03-29T06:20:21Z

GitLab CI 1498416 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498416.

ax3l · 2026-03-29T06:32:11Z

The following tests FAILED:
93 - Particles_CheckpointRestartDualGridHDF5SOA_2d (Failed)
94 - Particles_CheckpointRestartDualGridHDF5SOA_3d (Failed)

Only on H100. Unrelated

ax3l · 2026-03-29T06:32:19Z

/run-hpsf-gitlab-ci

github-actions · 2026-03-29T06:32:30Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498432.

amrex-gitlab-ci-reporter · 2026-03-29T06:57:18Z

GitLab CI 1498432 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498432.

Src/Base/AMReX_StdPinnedAllocator.H

Src/Base/AMReX_TrackedVector.H

Tests/Base/TrackedVector/main.cpp

This adds a helper class for synchronizing a pair of host & device vectors (w/o using managed memory). In BLAST codes, I find this to cause significant boilerplate in physics-focused parts of the code and this little helper class cuts down on lines and mental book-keeping. Additionally, especially in interactive pyAMReX and ImpactX workflows, this construct is a building block for enabling user-friendly and even multi-simulation-spanning user data (e.g., ImpactX element data) by enabling users to define their inputs even before AMReX was initialized and being able to reuse their input classes across many thousands of simulations, e.g., in optimization loops, even with AMReX device arenas being shut down/recreated in between.

ax3l · 2026-03-30T19:30:03Z

/run-hpsf-gitlab-ci

github-actions · 2026-03-30T19:30:15Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1500854.

ax3l · 2026-03-30T20:35:27Z

/run-hpsf-gitlab-ci

github-actions · 2026-03-30T20:35:39Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501106.

ax3l · 2026-03-30T20:46:32Z

@WeiqunZhang @atmyers @AlexanderSinn I am very happy with this design now, thank you for your feedback!

The easiest way to review this is first look at the user-facing docs, then the test cases I added, then the actual class.

Let me know what you think :)

AlexanderSinn · 2026-03-30T22:19:49Z

Src/Base/AMReX_TrackedVector.H

+         */
+        void to_device (bool force=false) {
+#ifdef AMREX_USE_GPU
+            if (status() != Status::up_to_date || force) {


Suggested change

if (status() != Status::up_to_date || force) {

if (status() == Status::host_dirty || force) {

If the status is device_dirty, then calling to_device would just overwrite the recent changes to the device side, but not when compiling for CPU and the vectors are shared.

I intentionally choose that: if you ask to_device() and the device is not what the host has (for any reason) it gets overwritten.

When compiling for CPU, host and device are in sync by definition/API contract, using the exact same memory. The status is never not Status::up_to_date for a CPU build.

amrex-gitlab-ci-reporter · 2026-03-30T22:24:48Z

GitLab CI 1501106 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501106.

Src/Base/AMReX_TrackedVector.H

ax3l · 2026-03-31T00:39:52Z

/run-hpsf-gitlab-ci

github-actions · 2026-03-31T00:40:05Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501542.

amrex-gitlab-ci-reporter · 2026-03-31T01:23:55Z

GitLab CI 1501542 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501542.

ax3l · 2026-03-31T02:25:21Z

Frank issue, CUDA GPU was busy:

builds/amrex/amrex/Src/Base/AMReX_GpuDevice.cpp line 331: CUDA-capable device(s) is/are busy or unavailable !!!
SIGABRT

Other GPUs and run before passed, so this is good (also, local GPU on my laptop passed for CUDA).

ax3l requested review from AlexanderSinn, WeiqunZhang and atmyers March 28, 2026 23:27

ax3l added the enhancement label Mar 28, 2026

ax3l force-pushed the topic-hostdevicevector branch from 4c5ad16 to 28ff6ce Compare March 28, 2026 23:34

ax3l commented Mar 28, 2026

View reviewed changes

Src/Base/AMReX_HostDeviceVector.H Outdated Show resolved Hide resolved

ax3l commented Mar 28, 2026

View reviewed changes

Src/Base/AMReX_HostDeviceVector.H Outdated Show resolved Hide resolved

ax3l changed the title ~~HostDeviceVector~~ [WIp] HostDeviceVector Mar 29, 2026

ax3l changed the title ~~[WIp] HostDeviceVector~~ [WIP] HostDeviceVector Mar 29, 2026

ax3l marked this pull request as draft March 29, 2026 00:19

ax3l force-pushed the topic-hostdevicevector branch from 59a2f0e to 069d84d Compare March 29, 2026 00:38

AlexanderSinn reviewed Mar 29, 2026

View reviewed changes

Src/Base/AMReX_HostDeviceVector.H Outdated Show resolved Hide resolved