Skip to content

Commit ddfdc9a

Browse files
committed
Merge remote-tracking branch 'origin/main' into cuda_update-gpu_test
2 parents a759810 + 1849f7a commit ddfdc9a

File tree

8 files changed

+25
-22
lines changed

8 files changed

+25
-22
lines changed

.ci/jenkins/pipeline/proj-jjb.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@
269269
- string:
270270
name: "NIXL_VERSION"
271271
default: "{jjb_branch}"
272-
description: "NIXL version to use (tag like 0.7.1, branch name, or commit hash)"
272+
description: "NIXL version to use (tag like 0.8.0, branch name, or commit hash)"
273273
- string:
274274
name: "UCX_VERSION"
275275
default: "v1.20.x"

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ members = [
2020
resolver = "3"
2121

2222
[workspace.package]
23-
version = "0.7.1"
23+
version = "0.8.0"
2424
edition = "2021"
2525
description = "Low-level bindings to NIXL - NVIDIA Inference Xfer Library"
2626
authors = ["NIXL Developers <[email protected]>"]

benchmark/nixlbench/meson.build

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16-
project('nixlbench', 'CPP', version: '0.7.1',
16+
project('nixlbench', 'CPP', version: '0.8.0',
1717
default_options: ['buildtype=release',
1818
'werror=true',
1919
'cpp_std=c++17',

examples/rust/Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

meson.build

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16-
project('nixl', 'CPP', version: '0.7.1',
16+
project('nixl', 'CPP', version: '0.8.0',
1717
default_options: ['buildtype=release',
1818
'werror=true',
1919
'cpp_std=c++17',

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ build-backend = "mesonpy"
1919

2020
[project]
2121
name = 'nixl-cu12'
22-
version = '0.7.1'
22+
version = '0.8.0'
2323
description = 'NIXL Python API'
2424
readme = 'README.md'
2525
license = {file = 'LICENSE'}

src/plugins/libfabric/README.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#NIXL Libfabric Plugin
1+
# NIXL Libfabric Plugin
22

33
This plugin provides a high-performance RDMA backend for NIXL using the OpenFabrics Interfaces (OFI) Libfabric library.
44

@@ -7,43 +7,44 @@ This plugin provides a high-performance RDMA backend for NIXL using the OpenFabr
77
The Libfabric plugin provides a high-performance RDMA communication backend with the following key capabilities:
88

99
- **Multi-Rail RDMA**: Automatic discovery and utilization of multiple network devices for increased bandwidth
10-
- **GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. And GDR support is currently mandated
10+
- **GPU Direct Support**: Zero-copy transfers between GPU memory (VRAM) and remote systems with CUDA integration. GDR (GPU Direct RDMA) support is currently required.
1111
- **Scalable Connection Management**: Efficient multi-agent connectivity with robust state tracking and automatic reconnection
1212
- **Asynchronous Processing**: Non-blocking RDMA operations with pre-allocated request pools and completion processing
1313
- **Thread-Safe Concurrency**: Background progress threads with lock-free data structures and configurable threading patterns
14-
15-
EFA Specific **Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance
14+
- **Topology-Aware Optimization**: Hardware-aware GPU-to-EFA and NUMA-to-EFA mapping using hwloc for optimal performance (EFA-specific)
1615

1716
## Dependencies
1817

1918
### Required Dependencies
2019

2120
- **Libfabric**
22-
- Many system will have installed libfabric already. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: v1.21.0
23-
- For EFA enabled AWS instances, it is recommanded to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
21+
- Many systems will have libfabric already installed. If not, custom libfabric installation is available via https://ofiwg.github.io/libfabric/ - Minimum required version: `v1.21.0`
22+
- For EFA enabled AWS instances, it is recommended to install through AWS EFA installer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html - Recommend to use the latest version
2423

2524
- **hwloc**
2625
- hwloc is used to understand the underlying architecture to optimize application performance. Suggested version: 2.10.0 or newer
2726

2827
### Network Hardware Requirements
2928

30-
Validated compatiblity with:
29+
Validated compatibility with:
30+
3131
- **AWS EFA** (Elastic Fabric Adapter)
3232

3333
Any other Libfabric providers should also work but have not been validated in production environments. Community validation and feedback are highly appreciated!
3434

3535
## Build Instructions
3636

3737
```bash
38-
#Basic build setup with default options
38+
# Basic build setup with default options
3939
$ meson setup <name_of_build_dir>
4040

41-
#Setup with custom options(example)
41+
# Setup with custom options (example)
4242
$ meson setup <name_of_build_dir> \
4343
-Dlibfabric_path=/path/to/libfabric
4444

45-
#Build and install
46-
ninja && ninja install
45+
# Build and install
46+
$ cd build
47+
$ ninja && ninja install
4748
```
4849

4950
## API Reference
@@ -62,23 +63,25 @@ ninja && ninja install
6263
### Debug Information
6364

6465
Enable debug logging by setting environment variables:
66+
6567
```bash
66-
#Libfabric debug logging
68+
# Libfabric debug logging
6769
export FI_LOG_LEVEL=debug
6870
export FI_LOG_PROV=efa # or verbs, tcp, etc.
6971

70-
#NIXL debug logging
72+
# NIXL debug logging
7173
export NIXL_LOG_LEVEL=debug
7274
```
7375

7476
### Common Issues
7577

7678
**No network devices detected:**
79+
7780
```bash
78-
#Check available fabric interfaces
81+
# Check available fabric interfaces
7982
fi_info -l
8083

81-
#For checking specific devices(e.g.EFA as an example)
84+
# For checking specific devices (e.g. EFA as an example)
8285
fi_info -p efa
8386
```
8487

0 commit comments

Comments
 (0)