Skip to content

Commit 1849f7a

Browse files
authored
docs(libfabric): improve README clarity and fix grammar issues (#1007)
* docs(libfabric): improve README clarity and fix grammar issues - Fix grammar: "Many system" -> "Many systems" - Clarify GDR requirement and improve sentence structure - Reorganize topology optimization into bullet list format - Enhance build instructions with concrete examples - Add missing punctuation and formatting consistency Signed-off-by: Nathan Na <[email protected]> * fixup! docs(libfabric): improve README clarity and fix grammar issues --------- Signed-off-by: Nathan Na <[email protected]>
1 parent 2860e56 commit 1849f7a

File tree

1 file changed

+18
-15
lines changed

1 file changed

+18
-15
lines changed

src/plugins/libfabric/README.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#NIXL Libfabric Plugin
1+
# NIXL Libfabric Plugin
22

33
This plugin provides a high-performance RDMA backend for NIXL using the OpenFabrics Interfaces (OFI) Libfabric library.
44

@@ -7,43 +7,44 @@ This plugin provides a high-performance RDMA backend for NIXL using the OpenFabr
77
The Libfabric plugin provides a high-performance RDMA communication backend with the following key capabilities:
88

99
- **Multi-Rail RDMA**: Automatic discovery and utilization of multiple network devices for increased bandwidth
10-
- **GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. And GDR support is currently mandated
10+
- **GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. GDR (GPU Direct RDMA) support is currently required.
1111
- **Scalable Connection Management**: Efficient multi-agent connectivity with robust state tracking and automatic reconnection
1212
- **Asynchronous Processing**: Non-blocking RDMA operations with pre-allocated request pools and completion processing
1313
- **Thread-Safe Concurrency**: Background progress threads with lock-free data structures and configurable threading patterns
14-
15-
EFA Specific **Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance
14+
- **Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance (EFA-specific)
1615

1716
## Dependencies
1817

1918
### Required Dependencies
2019

2120
- **Libfabric**
22-
- Many system will have installed libfabric already. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: v1.21.0
23-
- For EFA enabled AWS instances, it is recommanded to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
21+
- Many systems will have libfabric already installed. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: `v1.21.0`
22+
- For EFA enabled AWS instances, it is recommended to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
2423

2524
- **hwloc**
2625
- hwloc is used to understand the underlying architecture to optimize application performance. Suggested version: 2.10.0 or newer
2726

2827
### Network Hardware Requirements
2928

30-
Validated compatiblity with:
29+
Validated compatibility with:
30+
3131
- **AWS EFA** (Elastic Fabric Adapter)
3232

3333
Any other Libfabric providers should also work but have not been validated in production environments. Community validation and feedback are highly appreciated!
3434

3535
## Build Instructions
3636

3737
```bash
38-
#Basic build setup with default options
38+
# Basic build setup with default options
3939
$ meson setup <name_of_build_dir>
4040

41-
#Setup with custom options(example)
41+
# Setup with custom options (example)
4242
$ meson setup <name_of_build_dir> \
4343
-Dlibfabric_path=/path/to/libfabric
4444

45-
#Build and install
46-
ninja && ninja install
45+
# Build and install
46+
$ cd build
47+
$ ninja && ninja install
4748
```
4849

4950
## API Reference
@@ -62,23 +63,25 @@ ninja && ninja install
6263
### Debug Information
6364

6465
Enable debug logging by setting environment variables:
66+
6567
```bash
66-
#Libfabric debug logging
68+
# Libfabric debug logging
6769
export FI_LOG_LEVEL=debug
6870
export FI_LOG_PROV=efa # or verbs, tcp, etc.
6971

70-
#NIXL debug logging
72+
# NIXL debug logging
7173
export NIXL_LOG_LEVEL=debug
7274
```
7375

7476
### Common Issues
7577

7678
**No network devices detected:**
79+
7780
```bash
78-
#Check available fabric interfaces
81+
# Check available fabric interfaces
7982
fi_info -l
8083

81-
#For checking specific devices(e.g.EFA as an example)
84+
# For checking specific devices (e.g. EFA as an example)
8285
fi_info -p efa
8386
```
8487

0 commit comments

Comments
 (0)