Skip to content

TrackedVector#5253

Open
ax3l wants to merge 4 commits intoAMReX-Codes:developmentfrom
ax3l:topic-hostdevicevector
Open

TrackedVector#5253
ax3l wants to merge 4 commits intoAMReX-Codes:developmentfrom
ax3l:topic-hostdevicevector

Conversation

@ax3l
Copy link
Copy Markdown
Member

@ax3l ax3l commented Mar 28, 2026

Summary

This adds a helper class for synchronizing a pair of host & device vectors (w/o using managed memory).

In BLAST codes, I find this to cause significant boilerplate in physics-focused parts of the code and this little helper class cuts down on lines and mental book-keeping.

Additionally, especially in interactive pyAMReX and ImpactX workflows, this construct is a building block for enabling user-friendly and even multi-simulation-spanning user data (e.g., ImpactX element data) by enabling users to define their inputs even before AMReX was initialized and being able to reuse their input classes across many thousands of simulations, e.g., in optimization loops, even with AMReX device arenas being shut down/recreated in between.

Additional background

The unit test shows the most common usage patterns & needs.

BLAST-ImpactX/impactx#1368

Checklist

The proposed changes:

  • add new capabilities to AMReX
  • CPU build: write the optimized fallback path
  • GPU build: use AMReX-session agnostic pinned memory for host OR do synchronous copies
  • include documentation in the code and/or rst files, if appropriate
  • is covered by unit tests

@WeiqunZhang
Copy link
Copy Markdown
Member

Note that we already have a similar data container called Gpu::Buffer.

@ax3l ax3l force-pushed the topic-hostdevicevector branch from 4c5ad16 to 28ff6ce Compare March 28, 2026 23:34
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 28, 2026

Note that we already have a similar data container called Gpu::Buffer.

Ah that is cool, I forgot about that one.

After checking again Gpu::Buffer, there are subtly different needs in my implementation: In particular, I am not bound to AMReX being initialized to start writing (host) data into it, which is the driving need I have in pyAMReX/ImpactX to have more user-friendly initialization and lifetimes of complex simulation data.

We could extend/rewrite Gpu::Buffer or keep this separate with a distinct enough name & docs.

@WeiqunZhang
Copy link
Copy Markdown
Member

We could make Gpu::Buffer template on container type.

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 29, 2026

Yes, good idea. Let's finish the tests and impl. for this one and then we can investigate if/how we merge them?

@ax3l ax3l changed the title HostDeviceVector [WIp] HostDeviceVector Mar 29, 2026
@ax3l ax3l changed the title [WIp] HostDeviceVector [WIP] HostDeviceVector Mar 29, 2026
@ax3l ax3l marked this pull request as draft March 29, 2026 00:19
@ax3l ax3l force-pushed the topic-hostdevicevector branch from 59a2f0e to 069d84d Compare March 29, 2026 00:38
@ax3l ax3l force-pushed the topic-hostdevicevector branch from 32112df to 0b75b94 Compare March 29, 2026 05:44
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 29, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498416.

@ax3l ax3l force-pushed the topic-hostdevicevector branch 2 times, most recently from 5bacf0d to fc6dbb3 Compare March 29, 2026 06:09
@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1498416 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498416.

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 29, 2026

The following tests FAILED:
93 - Particles_CheckpointRestartDualGridHDF5SOA_2d (Failed)
94 - Particles_CheckpointRestartDualGridHDF5SOA_3d (Failed)

Only on H100. Unrelated

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 29, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498432.

@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1498432 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1498432.

@ax3l ax3l force-pushed the topic-hostdevicevector branch from 1e81608 to c01f940 Compare March 30, 2026 16:29
@ax3l ax3l force-pushed the topic-hostdevicevector branch from c01f940 to 6854723 Compare March 30, 2026 19:23
ax3l added 2 commits March 30, 2026 12:29
This adds a helper class for synchronizing a pair of host & device
vectors (w/o using managed memory).

In BLAST codes, I find this to cause significant boilerplate in
physics-focused parts of the code and this little helper class
cuts down on lines and mental book-keeping.

Additionally, especially in interactive pyAMReX and ImpactX
workflows, this construct is a building block for enabling
user-friendly and even multi-simulation-spanning user data
(e.g., ImpactX element data) by enabling users to define their
inputs even before AMReX was initialized and being able to reuse
their input classes across many thousands of simulations, e.g.,
in optimization loops, even with AMReX device arenas being
shut down/recreated in between.
@ax3l ax3l force-pushed the topic-hostdevicevector branch from 6854723 to 9f5fa96 Compare March 30, 2026 19:29
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 30, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1500854.

@ax3l ax3l changed the title [WIP] TrackedVector TrackedVector Mar 30, 2026
@ax3l ax3l force-pushed the topic-hostdevicevector branch from 9f5fa96 to 956baac Compare March 30, 2026 20:32
@ax3l ax3l force-pushed the topic-hostdevicevector branch from 956baac to 2e8d395 Compare March 30, 2026 20:33
@ax3l ax3l requested a review from AlexanderSinn March 30, 2026 20:33
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 30, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501106.

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 30, 2026

@WeiqunZhang @atmyers @AlexanderSinn I am very happy with this design now, thank you for your feedback!

The easiest way to review this is first look at the user-facing docs, then the test cases I added, then the actual class.

Let me know what you think :)

*/
void to_device (bool force=false) {
#ifdef AMREX_USE_GPU
if (status() != Status::up_to_date || force) {
Copy link
Copy Markdown
Member

@AlexanderSinn AlexanderSinn Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (status() != Status::up_to_date || force) {
if (status() == Status::host_dirty || force) {

If the status is device_dirty, then calling to_device would just overwrite the recent changes to the device side, but not when compiling for CPU and the vectors are shared.

Copy link
Copy Markdown
Member Author

@ax3l ax3l Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally choose that: if you ask to_device() and the device is not what the host has (for any reason) it gets overwritten.

When compiling for CPU, host and device are in sync by definition/API contract, using the exact same memory. The status is never not Status::up_to_date for a CPU build.

@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1501106 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501106.

@ax3l ax3l force-pushed the topic-hostdevicevector branch from 727bb23 to d6c97fe Compare March 31, 2026 00:10
@ax3l ax3l force-pushed the topic-hostdevicevector branch from d6c97fe to 8d0d62a Compare March 31, 2026 00:19
@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 31, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501542.

@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1501542 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1501542.

@ax3l
Copy link
Copy Markdown
Member Author

ax3l commented Mar 31, 2026

Frank issue, CUDA GPU was busy:

builds/amrex/amrex/Src/Base/AMReX_GpuDevice.cpp line 331: CUDA-capable device(s) is/are busy or unavailable !!!
SIGABRT

Other GPUs and run before passed, so this is good (also, local GPU on my laptop passed for CUDA).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants