You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/plugins/libfabric/README.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
#NIXL Libfabric Plugin
1
+
#NIXL Libfabric Plugin
2
2
3
3
This plugin provides a high-performance RDMA backend for NIXL using the OpenFabrics Interfaces (OFI) Libfabric library.
4
4
@@ -7,43 +7,44 @@ This plugin provides a high-performance RDMA backend for NIXL using the OpenFabr
7
7
The Libfabric plugin provides a high-performance RDMA communication backend with the following key capabilities:
8
8
9
9
-**Multi-Rail RDMA**: Automatic discovery and utilization of multiple network devices for increased bandwidth
10
-
-**GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. And GDR support is currently mandated
10
+
-**GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. GDR (GPU Direct RDMA) support is currently required.
11
11
-**Scalable Connection Management**: Efficient multi-agent connectivity with robust state tracking and automatic reconnection
12
12
-**Asynchronous Processing**: Non-blocking RDMA operations with pre-allocated request pools and completion processing
13
13
-**Thread-Safe Concurrency**: Background progress threads with lock-free data structures and configurable threading patterns
14
-
15
-
EFA Specific **Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance
14
+
-**Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance (EFA-specific)
16
15
17
16
## Dependencies
18
17
19
18
### Required Dependencies
20
19
21
20
-**Libfabric**
22
-
- Many system will have installed libfabric already. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: v1.21.0
23
-
- For EFA enabled AWS instances, it is recommanded to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
21
+
- Many systems will have libfabric already installed. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: `v1.21.0`
22
+
- For EFA enabled AWS instances, it is recommended to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
24
23
25
24
-**hwloc**
26
25
- hwloc is used to understand the underlying architecture to optimize application performance. Suggested version: 2.10.0 or newer
27
26
28
27
### Network Hardware Requirements
29
28
30
-
Validated compatiblity with:
29
+
Validated compatibility with:
30
+
31
31
-**AWS EFA** (Elastic Fabric Adapter)
32
32
33
33
Any other Libfabric providers should also work but have not been validated in production environments. Community validation and feedback are highly appreciated!
34
34
35
35
## Build Instructions
36
36
37
37
```bash
38
-
#Basic build setup with default options
38
+
#Basic build setup with default options
39
39
$ meson setup <name_of_build_dir>
40
40
41
-
#Setup with custom options(example)
41
+
#Setup with custom options(example)
42
42
$ meson setup <name_of_build_dir> \
43
43
-Dlibfabric_path=/path/to/libfabric
44
44
45
-
#Build and install
46
-
ninja && ninja install
45
+
# Build and install
46
+
$ cd build
47
+
$ ninja && ninja install
47
48
```
48
49
49
50
## API Reference
@@ -62,23 +63,25 @@ ninja && ninja install
62
63
### Debug Information
63
64
64
65
Enable debug logging by setting environment variables:
66
+
65
67
```bash
66
-
#Libfabric debug logging
68
+
#Libfabric debug logging
67
69
export FI_LOG_LEVEL=debug
68
70
export FI_LOG_PROV=efa # or verbs, tcp, etc.
69
71
70
-
#NIXL debug logging
72
+
#NIXL debug logging
71
73
export NIXL_LOG_LEVEL=debug
72
74
```
73
75
74
76
### Common Issues
75
77
76
78
**No network devices detected:**
79
+
77
80
```bash
78
-
#Check available fabric interfaces
81
+
#Check available fabric interfaces
79
82
fi_info -l
80
83
81
-
#For checking specific devices(e.g.EFA as an example)
84
+
#For checking specific devices(e.g.EFA as an example)
0 commit comments