Implement HostVector for more robust CPU <-> GPU data transfers by jipolanco · Pull Request #93 · jipolanco/VortexPasta.jl

jipolanco · 2026-02-02T11:24:33Z

This is used as a staging area in CPU memory during host-device transfers, which is pagelocked on CUDA and AMDGPU for faster transfers. We also avoid as much as possible reallocating this CPU memory (typically when the number of filament points changes), which would require redoing the pagelock. This seemed to cause crashes in AMDGPU in particular which seem to be fixed now. Moreover, the multi-GPU behaviour is more robust now (in CUDA in particular, where there were some issues).

github-actions · 2026-02-02T12:29:02Z

Benchmark Results (Julia v1)

Time benchmarks

	master	`80bea42`...	master / `80bea42`...
BiotSavart/add_local_integrals!	13.5 ± 0.99 ms	13.4 ± 0.86 ms	1.01 ± 0.099
BiotSavart/add_point_charges!	12.8 ± 0.44 ms	12.8 ± 0.27 ms	1 ± 0.04
BiotSavart/velocity	0.593 ± 0.008 s	0.611 ± 0.01 s	0.97 ± 0.021
BiotSavart/velocity + streamfunction	0.721 ± 0.011 s	0.729 ± 0.0037 s	0.988 ± 0.015
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted)	0.0824 ± 0.0037 s	0.0816 ± 0.0032 s	1.01 ± 0.061
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted)	0.106 ± 0.011 s	0.0975 ± 0.0066 s	1.09 ± 0.13
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted)	0.0602 ± 0.0021 s	0.0599 ± 0.0022 s	1.01 ± 0.051
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted)	0.0834 ± 0.024 s	0.0766 ± 0.0064 s	1.09 ± 0.33
CellLists/CPU/nsubdiv = 1/foreach_source	0.0745 ± 0.0077 s	0.0712 ± 0.0046 s	1.05 ± 0.13
CellLists/CPU/nsubdiv = 1/iterator_interface	0.0936 ± 0.0062 s	0.122 ± 0.024 s	0.77 ± 0.16
CellLists/CPU/nsubdiv = 1/set_elements!	1.98 ± 0.082 ms	1.99 ± 0.081 ms	0.996 ± 0.058
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted)	0.112 ± 0.0066 s	0.107 ± 0.0024 s	1.05 ± 0.065
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted)	0.276 ± 0.029 s	0.287 ± 0.012 s	0.963 ± 0.11
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted)	0.0993 ± 0.0093 s	0.105 ± 0.0045 s	0.948 ± 0.098
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted)	0.276 ± 0.025 s	0.259 ± 0.0078 s	1.06 ± 0.1
CellLists/CPU/nsubdiv = 2/foreach_source	0.283 ± 0.0097 s	0.273 ± 0.024 s	1.04 ± 0.097
CellLists/CPU/nsubdiv = 2/iterator_interface	0.34 ± 0.0081 s	0.378 ± 0.014 s	0.898 ± 0.04
CellLists/CPU/nsubdiv = 2/set_elements!	5.88 ± 0.73 ms	7.07 ± 0.68 ms	0.831 ± 0.13
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted)	0.0648 ± 0.0015 s	0.0672 ± 0.0032 s	0.965 ± 0.051
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted)	0.0666 ± 0.0037 s	0.0669 ± 0.0014 s	0.996 ± 0.059
CellLists/OpenCLBackend/nsubdiv = 1/set_elements!	2.32 ± 0.06 ms	2.33 ± 0.061 ms	1 ± 0.037
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted)	0.183 ± 0.0018 s	0.185 ± 0.0035 s	0.991 ± 0.021
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted)	0.215 ± 0.0079 s	0.212 ± 0.0044 s	1.02 ± 0.043
CellLists/OpenCLBackend/nsubdiv = 2/set_elements!	7.51 ± 0.54 ms	7.49 ± 0.37 ms	1 ± 0.087
Diagnostics/energy_flux	1.15 ± 0.0023 s	1.14 ± 0.009 s	1 ± 0.0082
Diagnostics/energy_injection_rate	8.65 ± 0.15 ms	8.65 ± 0.097 ms	1 ± 0.02
Diagnostics/energy_spectrum	2.26 ± 0.095 ms	2.26 ± 0.11 ms	1 ± 0.064
Diagnostics/energy_transfer_matrix	1.56 ± 0.1 s	1.44 ± 0.073 s	1.09 ± 0.091
Diagnostics/helicity	5.51 ± 0.087 ms	5.5 ± 0.041 ms	1 ± 0.018
Diagnostics/kinetic_energy	0.875 ± 0.015 ms	0.879 ± 0.014 ms	0.996 ± 0.023
Reconnections/ReconnectBasedOnDistance	3.26 ± 0.074 s	3.27 ± 0.087 s	0.997 ± 0.035
Reconnections/ReconnectFast	0.266 ± 0.019 s	0.26 ± 0.011 s	1.02 ± 0.085
Refinement/RefineBasedOnSegmentLength	7.21 ± 2.3 ms	6.91 ± 1.8 ms	1.04 ± 0.44
Timestepping/forcing	28.2 ± 3.9 ms	22.3 ± 3.1 ms	1.27 ± 0.25
Timestepping/step!	2.8 ± 0.072 s	2.8 ± 0.06 s	1 ± 0.033
time_to_load	1.63 ± 0.00093 s	1.64 ± 0.0054 s	0.995 ± 0.0033

Memory benchmarks

	master	`80bea42`...	master / `80bea42`...
BiotSavart/add_local_integrals!	7.04 k allocs: 4.17 MB	7.04 k allocs: 4.16 MB	1
BiotSavart/add_point_charges!	6.04 k allocs: 0.693 MB	6.04 k allocs: 0.677 MB	1.02
BiotSavart/velocity	15.6 k allocs: 4.97 MB	14.1 k allocs: 4.88 MB	1.02
BiotSavart/velocity + streamfunction	16 k allocs: 5.07 MB	14.6 k allocs: 4.98 MB	1.02
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted)	0.066 k allocs: 5.97 kB	0.066 k allocs: 5.97 kB	1
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted)	22 allocs: 2.22 kB	22 allocs: 2.22 kB	1
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted)	0.066 k allocs: 5.97 kB	0.066 k allocs: 5.97 kB	1
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted)	22 allocs: 2.22 kB	22 allocs: 2.22 kB	1
CellLists/CPU/nsubdiv = 1/foreach_source	22 allocs: 2.16 kB	22 allocs: 2.16 kB	1
CellLists/CPU/nsubdiv = 1/iterator_interface	22 allocs: 2.16 kB	1.2 M allocs: 0.0834 GB	2.46e-05
CellLists/CPU/nsubdiv = 1/set_elements!	0.044 k allocs: 3.69 kB	0.044 k allocs: 3.69 kB	1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted)	0.066 k allocs: 5.97 kB	0.066 k allocs: 5.97 kB	1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted)	22 allocs: 2.22 kB	22 allocs: 2.22 kB	1
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted)	0.066 k allocs: 5.97 kB	0.066 k allocs: 5.97 kB	1
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted)	22 allocs: 2.22 kB	22 allocs: 2.22 kB	1
CellLists/CPU/nsubdiv = 2/foreach_source	22 allocs: 2.16 kB	22 allocs: 2.16 kB	1
CellLists/CPU/nsubdiv = 2/iterator_interface	22 allocs: 2.16 kB	22 allocs: 2.16 kB	1
CellLists/CPU/nsubdiv = 2/set_elements!	0.044 k allocs: 3.69 kB	0.044 k allocs: 3.69 kB	1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted)	0.641 k allocs: 0.0398 MB	0.641 k allocs: 0.0398 MB	1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted)	0.122 k allocs: 9.11 kB	0.122 k allocs: 9.11 kB	1
CellLists/OpenCLBackend/nsubdiv = 1/set_elements!	0.731 k allocs: 0.0433 MB	0.731 k allocs: 0.0433 MB	1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted)	0.641 k allocs: 0.0398 MB	0.641 k allocs: 0.0398 MB	1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted)	0.122 k allocs: 9.11 kB	0.122 k allocs: 9.11 kB	1
CellLists/OpenCLBackend/nsubdiv = 2/set_elements!	0.731 k allocs: 0.0433 MB	0.731 k allocs: 0.0433 MB	1
Diagnostics/energy_flux	0.0556 M allocs: 8.59 MB	0.0556 M allocs: 8.59 MB	1
Diagnostics/energy_injection_rate	2.06 k allocs: 0.203 MB	2.06 k allocs: 0.203 MB	1
Diagnostics/energy_spectrum	26 allocs: 3.22 kB	26 allocs: 3.22 kB	1
Diagnostics/energy_transfer_matrix	25 k allocs: 0.0392 GB	25 k allocs: 0.0392 GB	1
Diagnostics/helicity	1.71 k allocs: 0.0703 MB	1.71 k allocs: 0.0703 MB	1
Diagnostics/kinetic_energy	2.05 k allocs: 0.0805 MB	2.05 k allocs: 0.0805 MB	1
Reconnections/ReconnectBasedOnDistance	4.18 M allocs: 0.312 GB	4.18 M allocs: 0.312 GB	1
Reconnections/ReconnectFast	7.49 M allocs: 0.196 GB	7.49 M allocs: 0.196 GB	1
Refinement/RefineBasedOnSegmentLength	5.59 k allocs: 0.263 MB	5.59 k allocs: 0.263 MB	1
Timestepping/forcing	2.55 k allocs: 0.126 MB	2.04 k allocs: 0.102 MB	1.24
Timestepping/step!	2.25 M allocs: 0.066 GB	2.24 M allocs: 0.0655 GB	1.01
time_to_load	0.149 k allocs: 11.1 kB	0.149 k allocs: 11.1 kB	1

codecov-commenter · 2026-02-02T15:09:56Z

Codecov Report

❌ Patch coverage is 78.45304% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.98%. Comparing base (fe398de) to head (80bea42).

Files with missing lines	Patch %	Lines
ext/VortexPastaAMDGPUExt.jl	0.00%	14 Missing ⚠️
ext/VortexPastaCUDAExt.jl	0.00%	14 Missing ⚠️
src/BiotSavart/host_device_transfers.jl	88.15%	9 Missing ⚠️
src/BiotSavart/BiotSavart.jl	96.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #93      +/-   ##
==========================================
- Coverage   93.32%   92.98%   -0.34%     
==========================================
  Files         122      126       +4     
  Lines        8280     8407     +127     
==========================================
+ Hits         7727     7817      +90     
- Misses        553      590      +37

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jipolanco added 7 commits January 30, 2026 17:09

Define copy_device_to_host!

c112a37

Implement HostVector + package extensions

3d7446a

Use HostVector in host-device copies (untested)

6e73bda

Update Project.toml

063c440

Fix HostVector constructor

f07b936

Fix resize_no_copy!

2840a8c

Make things work

45bb672

jipolanco added 2 commits February 2, 2026 15:04

Possible improvements

5a12812

Fix aliasing issue

d55d104

jipolanco added 3 commits February 3, 2026 10:03

Make things work on CUDA _and_ OpenCL

ef0fc8b

Add some @debug output

319eba9

ring_collision test: avoid scalar indexing on GPU

80bea42

jipolanco merged commit 42f5142 into master Feb 3, 2026
5 checks passed

jipolanco deleted the host-array branch February 3, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HostVector for more robust CPU <-> GPU data transfers#93

Implement HostVector for more robust CPU <-> GPU data transfers#93
jipolanco merged 12 commits intomasterfrom
host-array

jipolanco commented Feb 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jipolanco commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (Julia v1)

Uh oh!

codecov-commenter commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jipolanco commented Feb 2, 2026 •

edited

Loading

github-actions bot commented Feb 2, 2026 •

edited

Loading

codecov-commenter commented Feb 2, 2026 •

edited

Loading