Skip to content

Commit 1732252

Browse files
authored
Merge pull request #34 from jameskermode/gpu-support
Multi-threading and GPU support
2 parents 954ec37 + eb05b09 commit 1732252

24 files changed

+3333
-196
lines changed

.github/workflows/CI.yml

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ on:
55
- master
66
tags: ['*']
77
pull_request:
8+
workflow_dispatch:
89
concurrency:
910
# Skip intermediate builds: always.
1011
# Cancel intermediate builds: only if it is a pull request build.
@@ -18,20 +19,29 @@ jobs:
1819
fail-fast: false
1920
matrix:
2021
version:
21-
- '1.9'
22-
- '1'
23-
- 'nightly'
22+
- '1.11'
23+
- '1.12'
2424
os:
2525
- ubuntu-latest
2626
arch:
2727
- x64
2828
steps:
29-
- uses: actions/checkout@v2
30-
- uses: julia-actions/setup-julia@v1
29+
- uses: actions/checkout@v4
30+
- uses: julia-actions/setup-julia@v2
3131
with:
3232
version: ${{ matrix.version }}
3333
arch: ${{ matrix.arch }}
34-
- uses: julia-actions/cache@v1
35-
# - run: julia -e 'using Pkg; pkg"registry add https://github.com/ACEsuit/ACEregistry.git"'
34+
- uses: julia-actions/cache@v2
3635
- uses: julia-actions/julia-buildpkg@v1
37-
- uses: julia-actions/julia-runtest@v1
36+
- name: Run tests
37+
shell: bash
38+
run: |
39+
julia --project=test --color=yes -e '
40+
using Pkg
41+
Pkg.instantiate()
42+
# Resolve CondaPkg Python environment (installs matscipy)
43+
using CondaPkg
44+
CondaPkg.resolve()
45+
# Run tests
46+
include("test/runtests.jl")
47+
'

CondaPkg.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[deps]
2+
matscipy = ""
3+
numpy = ""

DEPRECATIONS.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# API Migration Guide
2+
3+
This document outlines the transition from the legacy linked-list algorithm to the unified sort-based implementation in NeighbourLists.jl.
4+
5+
## Version 0.6.x (Current)
6+
7+
The sort-based implementation is now the recommended API. The legacy linked-list implementation remains available as a reference implementation used internally for testing correctness.
8+
9+
### Legacy vs New API
10+
11+
| Legacy API | New API | Notes |
12+
|------------|---------|-------|
13+
| `PairList(X::Vector{SVec}, cutoff, cell, pbc)` | `neighbour_list(X, cutoff, cell, pbc)` | Sort-based, parallelizable |
14+
| `CellList` struct | `SortedCellList` | Used internally by new API |
15+
| `_celllist_` | `build_cell_list` | Internal function |
16+
| `_pairlist_` | `materialize_pairlist` | Internal function |
17+
18+
> **Note:** The legacy implementation will be retained indefinitely as a reference implementation for validating correctness in tests. However, new code should use the unified `neighbour_list()` API.
19+
20+
### New Unified API
21+
22+
```julia
23+
# High-level entry point (recommended)
24+
nlist = neighbour_list(X, cutoff, cell, pbc)
25+
26+
# Lazy iteration (memory efficient)
27+
clist = neighbour_list(X, cutoff, cell, pbc; lazy=true)
28+
for_each_neighbour(clist, i) do j, R, S
29+
# process neighbour
30+
end
31+
32+
# Unified accessors (work with both PairList and SortedCellList)
33+
js, Rs, Ss = neighbours(nlist, i)
34+
n = num_neighbours(nlist, i)
35+
```
36+
37+
### AtomsBase Support
38+
39+
AtomsBase integration has been moved to a package extension. To use it:
40+
41+
```julia
42+
using NeighbourLists
43+
using AtomsBase, Unitful
44+
45+
# Extension loads automatically
46+
nlist = PairList(system, 5.0u"Å")
47+
```
48+
49+
## Why Use the New API?
50+
51+
### Benefits of Sort-Based Algorithm
52+
53+
1. **GPU support**: Works on CUDA, ROCm, Metal, and oneAPI via KernelAbstractions.jl
54+
2. **Multi-threaded CPU**: Parallel construction and pair enumeration
55+
3. **Memory efficiency**: Option for lazy iteration without materializing all pairs
56+
4. **Consistent API**: Same code works on CPU and GPU
57+
58+
## Migration Guide
59+
60+
### Before (v0.5.x and earlier)
61+
62+
```julia
63+
using NeighbourLists
64+
65+
# Legacy linked-list constructor
66+
nlist = PairList(X, cutoff, cell, pbc)
67+
68+
# Access neighbours
69+
j, R = neigs(nlist, i)
70+
```
71+
72+
### After (v0.6.x+)
73+
74+
```julia
75+
using NeighbourLists
76+
77+
# New unified API (recommended)
78+
nlist = neighbour_list(X, cutoff, cell, pbc)
79+
80+
# Or explicitly with backend
81+
nlist = neighbour_list(X, cutoff, cell, pbc;
82+
backend=NeighbourLists.CPU())
83+
84+
# GPU support
85+
using CUDA
86+
X_gpu = CuArray(X)
87+
nlist_gpu = neighbour_list(X_gpu, cutoff, cell, pbc)
88+
89+
# Access neighbours (unchanged)
90+
j, R = neigs(nlist, i)
91+
# Or using unified accessor
92+
j, R, S = neighbours(nlist, i)
93+
94+
# Lazy iteration (new, memory efficient)
95+
clist = neighbour_list(X, cutoff, cell, pbc; lazy=true)
96+
for_each_neighbour(clist, i) do j, R, S
97+
# process each neighbour
98+
end
99+
```
100+
101+
### AtomsBase Users
102+
103+
```julia
104+
# Before: AtomsBase was always loaded
105+
using NeighbourLists
106+
107+
# After: Load AtomsBase explicitly to enable extension
108+
using NeighbourLists
109+
using AtomsBase, Unitful
110+
111+
# Then use as before
112+
nlist = PairList(system, 5.0u"Å")
113+
clist = build_cell_list(system, 5.0u"Å")
114+
```
115+
116+
## Questions?
117+
118+
If you have questions about migrating to the new API, please open an issue at:
119+
https://github.com/JuliaMolSim/NeighbourLists.jl/issues

Project.toml

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,52 @@
11
name = "NeighbourLists"
22
uuid = "2fcf5ba9-9ed4-57cf-b73f-ff513e316b9c"
3-
version = "0.5.10"
3+
version = "0.6.0"
44

55
[deps]
6-
AtomsBase = "a963bdd2-2df7-4f54-a1ee-49d51e6be12a"
6+
AcceleratedKernels = "6a4ca0a5-0e36-4168-a932-d9be78d558f1"
7+
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
8+
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
9+
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
710
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
811
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
12+
13+
[weakdeps]
14+
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
15+
AtomsBase = "a963bdd2-2df7-4f54-a1ee-49d51e6be12a"
916
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
1017

18+
[extensions]
19+
NeighbourListsAtomsBaseExt = ["AtomsBase", "Unitful"]
20+
NeighbourListsCUDAExt = "CUDA"
21+
1122
[compat]
12-
julia = "1"
13-
StaticArrays = "1"
23+
AcceleratedKernels = "0.4"
24+
Atomix = "0.1, 1"
1425
AtomsBase = "0.5"
26+
AtomsBuilder = "0.2.2"
27+
BenchmarkTools = "1.6.3"
28+
CUDA = "5"
29+
CondaPkg = "0.2"
30+
KernelAbstractions = "0.9"
1531
LinearAlgebra = "1"
32+
PythonCall = "0.9"
33+
StaticArrays = "1"
1634
Unitful = "1"
35+
julia = "1.11"
1736

1837
[extras]
38+
AtomsBase = "a963bdd2-2df7-4f54-a1ee-49d51e6be12a"
39+
AtomsBuilder = "f5cc8831-eeb7-4288-8d9f-d6c1ddb77004"
40+
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
41+
CondaPkg = "992eb4ea-22a4-4c89-a5bb-47a3300528ab"
1942
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
2043
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
2144
NearestNeighbors = "b8a86587-4115-5ab1-83bc-aa920d37bbce"
45+
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
2246
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
47+
PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"
2348
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
49+
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
2450

2551
[targets]
26-
test = ["Test", "Distances", "ForwardDiff", "NearestNeighbors", "Printf"]
52+
test = ["Test", "Distances", "ForwardDiff", "NearestNeighbors", "Printf", "AtomsBase", "AtomsBuilder", "PythonCall", "CondaPkg", "CUDA", "Unitful"]

README.md

Lines changed: 126 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,140 @@
11
# NeighbourLists.jl
22

3-
A Julia port and restructuring of the neighbourlist implemented in
4-
[matscipy](https://github.com/libAtoms/matscipy) (with the authors' permission).
5-
Single-threaded, the Julia version is faster than matscipy for small systems,
6-
probably due to the overhead of dealing with Python, but on large systems it is
7-
tends to be slower (up to around a factor 2 for 100,000 atoms).
3+
A Julia package for computing neighbour lists in molecular simulations. Originally a port of the neighbourlist from [matscipy](https://github.com/libAtoms/matscipy), now extended with multi-threaded CPU and portable GPU support.
84

9-
The package is can be used stand-alone, with
10-
[JuLIP.jl](https://github.com/libAtoms/JuLIP.jl), or with [AtomsBase.jl](https://github.com/JuliaMolSim/AtomsBase.jl).
5+
The package can be used stand-alone or with [AtomsBase.jl](https://github.com/JuliaMolSim/AtomsBase.jl).
116

12-
## Getting Started
7+
## Installation
138

14-
```Julia
9+
```julia
10+
using Pkg
1511
Pkg.add("NeighbourLists")
16-
using NeighbourLists
17-
?PairList
1812
```
1913

20-
### Usage via `AtomsBase.jl`
14+
## Unified API (Recommended)
15+
16+
The `neighbour_list()` function provides a unified entry point that works on both CPU and GPU with the same API. The backend is automatically detected from the array type.
17+
18+
> **Note:** The legacy `PairList` constructor using linked-list algorithm is retained as a reference implementation for testing. New code should use `neighbour_list()` instead. See [DEPRECATIONS.md](DEPRECATIONS.md) for migration details.
19+
20+
### CPU Example (Multi-threaded)
21+
22+
```julia
23+
using NeighbourLists, StaticArrays, LinearAlgebra
24+
25+
# Create positions (CPU Vector)
26+
L = 10.0
27+
X = [SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000]
28+
cell = SMatrix{3,3,Float64}(L*I)
29+
pbc = SVector{3,Bool}(true, true, true)
30+
31+
# Build neighbour list (uses sort-based algorithm with multi-threading)
32+
nlist = neighbour_list(X, 3.0, cell, pbc)
33+
34+
# Access neighbours of atom 1
35+
j, R, S = neighbours(nlist, 1)
36+
```
37+
38+
### GPU Example (CUDA, ROCm, Metal, oneAPI)
2139

2240
```julia
23-
using ASEconvert, NeighbourLists, Unitful
24-
cu = ase.build.bulk("Cu") * pytuple((4, 2, 3))
25-
sys = pyconvert(AbstractSystem, cu)
26-
nlist = PairList(sys, 3.5u"Å")
27-
neigs_1, Rs_1 = neigs(nlist, 1)
41+
using NeighbourLists, StaticArrays, LinearAlgebra
42+
using CUDA # or AMDGPU, Metal, oneAPI
43+
44+
# Create positions on GPU (only difference: use CuArray)
45+
L = 10.0
46+
X = CuArray([SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000])
47+
cell = SMatrix{3,3,Float64}(L*I)
48+
pbc = SVector{3,Bool}(true, true, true)
49+
50+
# Same API - backend auto-detected from array type
51+
nlist = neighbour_list(X, 3.0, cell, pbc)
52+
```
53+
54+
**What's the same:** The `neighbour_list()` API is identical on CPU and GPU. Cell matrix, cutoff, and boundary conditions work the same way.
55+
56+
**What's different:** Only the array type changes (`Vector` vs `CuArray`/`ROCArray`/etc.). The backend is automatically detected - no need to specify it manually.
57+
58+
### Lazy Mode (Memory Efficient)
59+
60+
For large systems where materializing all pairs is memory-intensive, use lazy mode:
61+
62+
```julia
63+
# Returns a SortedCellList instead of materializing all pairs
64+
clist = neighbour_list(X, 3.0, cell, pbc; lazy=true)
65+
66+
# Iterate without storing all pairs in memory
67+
for i in 1:nsites(clist)
68+
for_each_neighbour(clist, i) do j, R, S
69+
# process neighbour j with distance vector R and shift S
70+
end
71+
end
2872
```
2973

30-
### Usage via `JuLIP.jl`
74+
### AtomsBase.jl Integration
3175

3276
```julia
33-
using JuLIP
34-
at = bulk(:Cu) * (4, 2, 3)
35-
nlist = neighbourlist(at, 3.5)
36-
neigs_1, Rs_1 = neigs(nlist, 1)
37-
```
38-
39-
Please also look at the tests on how to use this package. Or just open an issue and
40-
ask.
77+
using AtomsBuilder, NeighbourLists, Unitful
78+
79+
sys = bulk(:Cu, cubic=true) * (4, 4, 4)
80+
nlist = neighbour_list(sys, 5.0u"Å")
81+
j, R, S = neighbours(nlist, 1) # neighbours of atom 1
82+
83+
# Lazy mode also works with AtomsBase systems
84+
clist = neighbour_list(sys, 5.0u"Å"; lazy=true)
85+
for_each_neighbour(clist, 1) do j, R, S
86+
# process neighbour
87+
end
88+
```
89+
90+
The implementation uses [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for portable parallelism and [AcceleratedKernels.jl](https://github.com/JuliaGPU/AcceleratedKernels.jl) for portable sorting. On CPU this enables multi-threading; on GPU it runs native parallel kernels.
91+
92+
## Two Implementations
93+
94+
The package provides two cell list implementations:
95+
96+
| Implementation | Algorithm | Parallelism | Status |
97+
|---------------|-----------|-------------|--------|
98+
| **Sort-based** | Sort by cell ID | Multi-threaded CPU, GPU | Recommended |
99+
| **Legacy** | Linked-list | Single-threaded | Reference implementation for testing |
100+
101+
Both produce identical results (validated in tests).
102+
103+
**API Selection:**
104+
- `neighbour_list()` always uses the sort-based implementation (recommended)
105+
- `PairList(system::AbstractSystem, cutoff)` uses sort-based (for AtomsBase)
106+
- `PairList(X::Vector{SVec}, cutoff, cell, pbc)` uses legacy linked-list (reference implementation)
107+
108+
## Migration Guide
109+
110+
The legacy linked-list implementation (`CellList`, `_celllist_`, `_pairlist_`) is retained as a reference for testing correctness, but new code should use the unified API.
111+
112+
**Recommended changes:**
113+
- Use `neighbour_list(X, cutoff, cell, pbc)` instead of `PairList(X, cutoff, cell, pbc)`
114+
- Use `neighbours(nlist, i)` instead of `neigss(nlist, i)` (both still work)
115+
- For memory-efficient iteration, use `neighbour_list(...; lazy=true)` with `for_each_neighbour`
116+
117+
See [DEPRECATIONS.md](DEPRECATIONS.md) for the complete migration guide.
118+
119+
## Benchmarks
120+
121+
Benchmarks on NVIDIA RTX A4500 (cutoff = 5.0 Å, density = 0.05 atoms/ų):
122+
123+
| Atoms | Pairs | Legacy | CPU (1T) | CPU (8T) | GPU | Speedup |
124+
|------:|------:|-------:|---------:|---------:|--------:|--------:|
125+
| 1,000 | 26k | 8 ms | 3.6 ms | 3.4 ms | 2.3 ms | 3.5x |
126+
| 5,000 | 131k | 38 ms | 17 ms | 3.9 ms | 2.2 ms | 17x |
127+
| 10,000 | 262k | 84 ms | 35 ms | 7.8 ms | 2.4 ms | 36x |
128+
| 50,000 | 1.3M | 516 ms | 201 ms | 31 ms | 4.2 ms | 124x |
129+
| 100,000 | 2.6M | 1.1 s | 400 ms | 62 ms | 6.9 ms | 160x |
130+
131+
GPU throughput: ~370 million pairs/second for large systems.
132+
133+
*Note: Speedup is GPU vs Legacy. Run `julia --project -t N scripts/benchmark.jl` to reproduce.*
134+
135+
136+
## Acknowledgements
137+
138+
- Original inspiration from [matscipy](https://github.com/libAtoms/matscipy) neighbourlist written by Lars Pastewka
139+
- Linked-list approach was implemented by Christoph Ortner
140+
- Sort-based approach idea proposed by Teemu Järvinen and Timon Gutleb, and implemented by James Kermode

0 commit comments

Comments
 (0)