Skip to content

Parallelise some CPU -> CPU copies#94

Merged
jipolanco merged 2 commits intomasterfrom
cpu-cpu-threads
Feb 3, 2026
Merged

Parallelise some CPU -> CPU copies#94
jipolanco merged 2 commits intomasterfrom
cpu-cpu-threads

Conversation

@jipolanco
Copy link
Owner

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Benchmark Results (Julia v1)

Time benchmarks
master 695096f... master / 695096f...
BiotSavart/add_local_integrals! 13.3 ± 0.18 ms 13.3 ± 0.88 ms 0.998 ± 0.067
BiotSavart/add_point_charges! 12.7 ± 0.45 ms 12.7 ± 0.36 ms 1 ± 0.045
BiotSavart/velocity 0.572 ± 0.0046 s 0.578 ± 0.0047 s 0.989 ± 0.011
BiotSavart/velocity + streamfunction 0.691 ± 0.0078 s 0.692 ± 0.0078 s 0.999 ± 0.016
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted) 0.0789 ± 0.0015 s 0.077 ± 0.0021 s 1.03 ± 0.034
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted) 0.0968 ± 0.01 s 0.0889 ± 0.0011 s 1.09 ± 0.11
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted) 0.0566 ± 0.00061 s 0.055 ± 0.00064 s 1.03 ± 0.016
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted) 0.0672 ± 0.00081 s 0.0648 ± 0.005 s 1.04 ± 0.08
CellLists/CPU/nsubdiv = 1/foreach_source 0.0675 ± 0.00068 s 0.0639 ± 0.00082 s 1.06 ± 0.017
CellLists/CPU/nsubdiv = 1/iterator_interface 0.106 ± 0.025 s 0.0766 ± 0.00091 s 1.39 ± 0.33
CellLists/CPU/nsubdiv = 1/set_elements! 1.98 ± 0.066 ms 1.97 ± 0.073 ms 1 ± 0.05
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted) 0.107 ± 0.0018 s 0.108 ± 0.0011 s 0.996 ± 0.019
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted) 0.255 ± 0.0025 s 0.263 ± 0.011 s 0.969 ± 0.042
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted) 0.096 ± 0.001 s 0.0989 ± 0.0052 s 0.97 ± 0.052
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted) 0.253 ± 0.0097 s 0.252 ± 0.0039 s 1.01 ± 0.041
CellLists/CPU/nsubdiv = 2/foreach_source 0.25 ± 0.0028 s 0.248 ± 0.0021 s 1.01 ± 0.014
CellLists/CPU/nsubdiv = 2/iterator_interface 0.328 ± 0.0017 s 0.328 ± 0.0016 s 0.999 ± 0.0072
CellLists/CPU/nsubdiv = 2/set_elements! 5.59 ± 0.22 ms 6.04 ± 1.3 ms 0.926 ± 0.21
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted) 0.0633 ± 0.00058 s 0.064 ± 0.00055 s 0.989 ± 0.012
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted) 0.0651 ± 0.00044 s 0.067 ± 0.00055 s 0.973 ± 0.01
CellLists/OpenCLBackend/nsubdiv = 1/set_elements! 2.32 ± 0.055 ms 2.33 ± 0.059 ms 0.995 ± 0.035
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted) 0.185 ± 0.00061 s 0.183 ± 0.0034 s 1.01 ± 0.019
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted) 0.208 ± 0.00091 s 0.207 ± 0.0023 s 1.01 ± 0.012
CellLists/OpenCLBackend/nsubdiv = 2/set_elements! 7.16 ± 0.18 ms 7.22 ± 0.21 ms 0.992 ± 0.038
Diagnostics/energy_flux 1.13 ± 0.0053 s 1.18 ± 0.0089 s 0.955 ± 0.0085
Diagnostics/energy_injection_rate 8.65 ± 0.14 ms 8.67 ± 0.17 ms 0.999 ± 0.025
Diagnostics/energy_spectrum 2.25 ± 0.093 ms 2.25 ± 0.1 ms 1 ± 0.062
Diagnostics/energy_transfer_matrix 1.47 ± 0.22 s 1.55 ± 0.17 s 0.952 ± 0.18
Diagnostics/helicity 5.51 ± 0.071 ms 5.5 ± 0.053 ms 1 ± 0.016
Diagnostics/kinetic_energy 0.874 ± 0.015 ms 0.873 ± 0.016 ms 1 ± 0.025
Reconnections/ReconnectBasedOnDistance 3.14 ± 0.064 s 3.13 ± 0.11 s 1 ± 0.042
Reconnections/ReconnectFast 0.259 ± 0.02 s 0.265 ± 0.02 s 0.979 ± 0.1
Refinement/RefineBasedOnSegmentLength 7.17 ± 2.3 ms 6.86 ± 2 ms 1.05 ± 0.45
Timestepping/forcing 22.1 ± 3.1 ms 22.1 ± 3 ms 0.999 ± 0.2
Timestepping/step! 2.66 ± 0.064 s 2.72 ± 0.072 s 0.978 ± 0.035
time_to_load 1.6 ± 0.0064 s 1.6 ± 0.0071 s 0.998 ± 0.006
Memory benchmarks
master 695096f... master / 695096f...
BiotSavart/add_local_integrals! 7.04 k allocs: 4.16 MB 7.04 k allocs: 4.16 MB 1
BiotSavart/add_point_charges! 6.04 k allocs: 0.677 MB 6.04 k allocs: 0.677 MB 1
BiotSavart/velocity 14.1 k allocs: 4.89 MB 14.9 k allocs: 4.94 MB 0.99
BiotSavart/velocity + streamfunction 14.6 k allocs: 4.99 MB 15.5 k allocs: 5.05 MB 0.988
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 1/foreach_source 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 1/iterator_interface 1.2 M allocs: 0.0834 GB 22 allocs: 2.16 kB 4.06e+04
CellLists/CPU/nsubdiv = 1/set_elements! 0.044 k allocs: 3.69 kB 0.044 k allocs: 3.69 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 2/foreach_source 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 2/iterator_interface 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 2/set_elements! 0.044 k allocs: 3.69 kB 0.044 k allocs: 3.69 kB 1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted) 0.641 k allocs: 0.0398 MB 0.641 k allocs: 0.0398 MB 1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted) 0.122 k allocs: 9.11 kB 0.122 k allocs: 9.11 kB 1
CellLists/OpenCLBackend/nsubdiv = 1/set_elements! 0.731 k allocs: 0.0433 MB 0.731 k allocs: 0.0433 MB 1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted) 0.641 k allocs: 0.0398 MB 0.641 k allocs: 0.0398 MB 1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted) 0.122 k allocs: 9.11 kB 0.122 k allocs: 9.11 kB 1
CellLists/OpenCLBackend/nsubdiv = 2/set_elements! 0.731 k allocs: 0.0433 MB 0.731 k allocs: 0.0433 MB 1
Diagnostics/energy_flux 0.0556 M allocs: 8.59 MB 0.0568 M allocs: 8.66 MB 0.992
Diagnostics/energy_injection_rate 2.06 k allocs: 0.203 MB 2.06 k allocs: 0.203 MB 1
Diagnostics/energy_spectrum 26 allocs: 3.22 kB 26 allocs: 3.22 kB 1
Diagnostics/energy_transfer_matrix 25 k allocs: 0.0392 GB 26.3 k allocs: 0.0393 GB 0.998
Diagnostics/helicity 1.71 k allocs: 0.0703 MB 1.71 k allocs: 0.0703 MB 1
Diagnostics/kinetic_energy 2.05 k allocs: 0.0805 MB 2.05 k allocs: 0.0805 MB 1
Reconnections/ReconnectBasedOnDistance 4.18 M allocs: 0.312 GB 4.18 M allocs: 0.312 GB 1
Reconnections/ReconnectFast 7.49 M allocs: 0.196 GB 7.49 M allocs: 0.196 GB 1
Refinement/RefineBasedOnSegmentLength 5.59 k allocs: 0.263 MB 5.59 k allocs: 0.263 MB 1
Timestepping/forcing 2.04 k allocs: 0.102 MB 2.29 k allocs: 0.119 MB 0.855
Timestepping/step! 2.24 M allocs: 0.0656 GB 2.24 M allocs: 0.0658 GB 0.996
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 72.97297% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.90%. Comparing base (e27b3a4) to head (695096f).

Files with missing lines Patch % Lines
src/BiotSavart/host_device_transfers.jl 47.36% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
- Coverage   93.01%   92.90%   -0.11%     
==========================================
  Files         126      126              
  Lines        8407     8433      +26     
==========================================
+ Hits         7820     7835      +15     
- Misses        587      598      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jipolanco jipolanco merged commit 7e8beb9 into master Feb 3, 2026
5 checks passed
@jipolanco jipolanco deleted the cpu-cpu-threads branch February 3, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants