|
1 | 1 | # NeighbourLists.jl |
2 | 2 |
|
3 | | -A Julia port and restructuring of the neighbourlist implemented in |
4 | | -[matscipy](https://github.com/libAtoms/matscipy) (with the authors' permission). |
5 | | -Single-threaded, the Julia version is faster than matscipy for small systems, |
6 | | -probably due to the overhead of dealing with Python, but on large systems it is |
7 | | -tends to be slower (up to around a factor 2 for 100,000 atoms). |
| 3 | +A Julia package for computing neighbour lists in molecular simulations. Originally a port of the neighbourlist from [matscipy](https://github.com/libAtoms/matscipy), now extended with multi-threaded CPU and portable GPU support. |
8 | 4 |
|
9 | | -The package is can be used stand-alone, with |
10 | | -[JuLIP.jl](https://github.com/libAtoms/JuLIP.jl), or with [AtomsBase.jl](https://github.com/JuliaMolSim/AtomsBase.jl). |
| 5 | +The package can be used stand-alone or with [AtomsBase.jl](https://github.com/JuliaMolSim/AtomsBase.jl). |
11 | 6 |
|
12 | | -## Getting Started |
| 7 | +## Installation |
13 | 8 |
|
14 | | -```Julia |
| 9 | +```julia |
| 10 | +using Pkg |
15 | 11 | Pkg.add("NeighbourLists") |
16 | | -using NeighbourLists |
17 | | -?PairList |
18 | 12 | ``` |
19 | 13 |
|
20 | | -### Usage via `AtomsBase.jl` |
| 14 | +## Unified API (Recommended) |
| 15 | + |
| 16 | +The `neighbour_list()` function provides a unified entry point that works on both CPU and GPU with the same API. The backend is automatically detected from the array type. |
| 17 | + |
| 18 | +> **Note:** The legacy `PairList` constructor using linked-list algorithm is retained as a reference implementation for testing. New code should use `neighbour_list()` instead. See [DEPRECATIONS.md](DEPRECATIONS.md) for migration details. |
| 19 | +
|
| 20 | +### CPU Example (Multi-threaded) |
| 21 | + |
| 22 | +```julia |
| 23 | +using NeighbourLists, StaticArrays, LinearAlgebra |
| 24 | + |
| 25 | +# Create positions (CPU Vector) |
| 26 | +L = 10.0 |
| 27 | +X = [SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000] |
| 28 | +cell = SMatrix{3,3,Float64}(L*I) |
| 29 | +pbc = SVector{3,Bool}(true, true, true) |
| 30 | + |
| 31 | +# Build neighbour list (uses sort-based algorithm with multi-threading) |
| 32 | +nlist = neighbour_list(X, 3.0, cell, pbc) |
| 33 | + |
| 34 | +# Access neighbours of atom 1 |
| 35 | +j, R, S = neighbours(nlist, 1) |
| 36 | +``` |
| 37 | + |
| 38 | +### GPU Example (CUDA, ROCm, Metal, oneAPI) |
21 | 39 |
|
22 | 40 | ```julia |
23 | | -using ASEconvert, NeighbourLists, Unitful |
24 | | -cu = ase.build.bulk("Cu") * pytuple((4, 2, 3)) |
25 | | -sys = pyconvert(AbstractSystem, cu) |
26 | | -nlist = PairList(sys, 3.5u"Å") |
27 | | -neigs_1, Rs_1 = neigs(nlist, 1) |
| 41 | +using NeighbourLists, StaticArrays, LinearAlgebra |
| 42 | +using CUDA # or AMDGPU, Metal, oneAPI |
| 43 | + |
| 44 | +# Create positions on GPU (only difference: use CuArray) |
| 45 | +L = 10.0 |
| 46 | +X = CuArray([SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000]) |
| 47 | +cell = SMatrix{3,3,Float64}(L*I) |
| 48 | +pbc = SVector{3,Bool}(true, true, true) |
| 49 | + |
| 50 | +# Same API - backend auto-detected from array type |
| 51 | +nlist = neighbour_list(X, 3.0, cell, pbc) |
| 52 | +``` |
| 53 | + |
| 54 | +**What's the same:** The `neighbour_list()` API is identical on CPU and GPU. Cell matrix, cutoff, and boundary conditions work the same way. |
| 55 | + |
| 56 | +**What's different:** Only the array type changes (`Vector` vs `CuArray`/`ROCArray`/etc.). The backend is automatically detected - no need to specify it manually. |
| 57 | + |
| 58 | +### Lazy Mode (Memory Efficient) |
| 59 | + |
| 60 | +For large systems where materializing all pairs is memory-intensive, use lazy mode: |
| 61 | + |
| 62 | +```julia |
| 63 | +# Returns a SortedCellList instead of materializing all pairs |
| 64 | +clist = neighbour_list(X, 3.0, cell, pbc; lazy=true) |
| 65 | + |
| 66 | +# Iterate without storing all pairs in memory |
| 67 | +for i in 1:nsites(clist) |
| 68 | + for_each_neighbour(clist, i) do j, R, S |
| 69 | + # process neighbour j with distance vector R and shift S |
| 70 | + end |
| 71 | +end |
28 | 72 | ``` |
29 | 73 |
|
30 | | -### Usage via `JuLIP.jl` |
| 74 | +### AtomsBase.jl Integration |
31 | 75 |
|
32 | 76 | ```julia |
33 | | -using JuLIP |
34 | | -at = bulk(:Cu) * (4, 2, 3) |
35 | | -nlist = neighbourlist(at, 3.5) |
36 | | -neigs_1, Rs_1 = neigs(nlist, 1) |
37 | | -``` |
38 | | - |
39 | | -Please also look at the tests on how to use this package. Or just open an issue and |
40 | | -ask. |
| 77 | +using AtomsBuilder, NeighbourLists, Unitful |
| 78 | + |
| 79 | +sys = bulk(:Cu, cubic=true) * (4, 4, 4) |
| 80 | +nlist = neighbour_list(sys, 5.0u"Å") |
| 81 | +j, R, S = neighbours(nlist, 1) # neighbours of atom 1 |
| 82 | + |
| 83 | +# Lazy mode also works with AtomsBase systems |
| 84 | +clist = neighbour_list(sys, 5.0u"Å"; lazy=true) |
| 85 | +for_each_neighbour(clist, 1) do j, R, S |
| 86 | + # process neighbour |
| 87 | +end |
| 88 | +``` |
| 89 | + |
| 90 | +The implementation uses [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for portable parallelism and [AcceleratedKernels.jl](https://github.com/JuliaGPU/AcceleratedKernels.jl) for portable sorting. On CPU this enables multi-threading; on GPU it runs native parallel kernels. |
| 91 | + |
| 92 | +## Two Implementations |
| 93 | + |
| 94 | +The package provides two cell list implementations: |
| 95 | + |
| 96 | +| Implementation | Algorithm | Parallelism | Status | |
| 97 | +|---------------|-----------|-------------|--------| |
| 98 | +| **Sort-based** | Sort by cell ID | Multi-threaded CPU, GPU | Recommended | |
| 99 | +| **Legacy** | Linked-list | Single-threaded | Reference implementation for testing | |
| 100 | + |
| 101 | +Both produce identical results (validated in tests). |
| 102 | + |
| 103 | +**API Selection:** |
| 104 | +- `neighbour_list()` always uses the sort-based implementation (recommended) |
| 105 | +- `PairList(system::AbstractSystem, cutoff)` uses sort-based (for AtomsBase) |
| 106 | +- `PairList(X::Vector{SVec}, cutoff, cell, pbc)` uses legacy linked-list (reference implementation) |
| 107 | + |
| 108 | +## Migration Guide |
| 109 | + |
| 110 | +The legacy linked-list implementation (`CellList`, `_celllist_`, `_pairlist_`) is retained as a reference for testing correctness, but new code should use the unified API. |
| 111 | + |
| 112 | +**Recommended changes:** |
| 113 | +- Use `neighbour_list(X, cutoff, cell, pbc)` instead of `PairList(X, cutoff, cell, pbc)` |
| 114 | +- Use `neighbours(nlist, i)` instead of `neigss(nlist, i)` (both still work) |
| 115 | +- For memory-efficient iteration, use `neighbour_list(...; lazy=true)` with `for_each_neighbour` |
| 116 | + |
| 117 | +See [DEPRECATIONS.md](DEPRECATIONS.md) for the complete migration guide. |
| 118 | + |
| 119 | +## Benchmarks |
| 120 | + |
| 121 | +Benchmarks on NVIDIA RTX A4500 (cutoff = 5.0 Å, density = 0.05 atoms/ų): |
| 122 | + |
| 123 | +| Atoms | Pairs | Legacy | CPU (1T) | CPU (8T) | GPU | Speedup | |
| 124 | +|------:|------:|-------:|---------:|---------:|--------:|--------:| |
| 125 | +| 1,000 | 26k | 8 ms | 3.6 ms | 3.4 ms | 2.3 ms | 3.5x | |
| 126 | +| 5,000 | 131k | 38 ms | 17 ms | 3.9 ms | 2.2 ms | 17x | |
| 127 | +| 10,000 | 262k | 84 ms | 35 ms | 7.8 ms | 2.4 ms | 36x | |
| 128 | +| 50,000 | 1.3M | 516 ms | 201 ms | 31 ms | 4.2 ms | 124x | |
| 129 | +| 100,000 | 2.6M | 1.1 s | 400 ms | 62 ms | 6.9 ms | 160x | |
| 130 | + |
| 131 | +GPU throughput: ~370 million pairs/second for large systems. |
| 132 | + |
| 133 | +*Note: Speedup is GPU vs Legacy. Run `julia --project -t N scripts/benchmark.jl` to reproduce.* |
| 134 | + |
| 135 | + |
| 136 | +## Acknowledgements |
| 137 | + |
| 138 | +- Original inspiration from [matscipy](https://github.com/libAtoms/matscipy) neighbourlist written by Lars Pastewka |
| 139 | +- Linked-list approach was implemented by Christoph Ortner |
| 140 | +- Sort-based approach idea proposed by Teemu Järvinen and Timon Gutleb, and implemented by James Kermode |
0 commit comments