NVSHMEM4Py Integration #33

Willy-Chan · 2025-08-06T22:24:17Z

This PR will replace existing custom python bindings with NVSHMEM4Py - official Python language binding for NVSHMEM.

Scope of changes include:

Migrate from static linking of host-side initialization (libnvshmem.a) to dynamic linking (libnvshmem_host.so)
Remove existing bindings and replaced them with their respective NVSHMEM4Py core API
Added helper function for NVSHMEM initialization

Changes include dynamic linking with host-side initialization, deletion of existing bindings, addition of nvshmem4py, and addition of helper functions.

Willy-Chan · 2025-08-07T00:29:22Z

Confirmed functionality on the following system configurations:

DGX-H100 with Infiniband
DGX-H100 with AWS-EFA
B200 with MNNVL

These are benchmark perf difference results on dispatch and combine using MoEConfig(128, 8, 7186, 128):

System	Experts	E/Tok	Hidden Dim	Tokens	Dispatch Latency Perf Diff (lower is better)	Combine Latency Perf Diff (lower is better)
H100	128	8	7168	128	2.15%	2.96%
H100 (IB, 2 Node)	128	8	7168	128	0.84%	-2.94%
H100 (IB, 4 Node)	128	8	7168	128	0.59%	1.39%
H100 (IB, 8 Node)	128	8	7168	128	1.74%	-1.11%
H100 (EFA, 2 Node)	128	8	7168	128	-2.24%	1.27%
H100 (EFA, 4 Node)	128	8	7168	128	2.27%	-2.11%
H100 (EFA, 8 Node)	128	8	7168	128	-1.41%	-3.67%
B200	128	8	7168	128	1.08%	0.60%
B200 (NVLink, 2 Node)	128	8	7168	128	-2.23%	0.81%
B200 (NVLink, 4 Node)	128	8	7168	128	1.94%	-3.09%
B200 (NVLink, 8 Node)	128	8	7168	128	-5.60%	-2.90%

Performance percentages are calculated by measuring kernel latency using the provided benchmark, and negative percentage indicates that NVSHMEM4Py is faster.

csrc/CMakeLists.txt

…e correctness.

Willy-Chan force-pushed the nvshmem4py_bindings_integration branch from 428a067 to d258021 Compare August 7, 2025 00:04

Swap out existing NVSHMEM python bindings for official NVIDIA variant.

c13a7fe

Changes include dynamic linking with host-side initialization, deletion of existing bindings, addition of nvshmem4py, and addition of helper functions.

Willy-Chan force-pushed the nvshmem4py_bindings_integration branch from 0477c4c to c13a7fe Compare August 7, 2025 00:26

seth-howell reviewed Aug 8, 2025

View reviewed changes

csrc/CMakeLists.txt Outdated Show resolved Hide resolved

Removed bootstrap link requirement and confirmed single and multi-nod…

22a63a0

…e correctness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVSHMEM4Py Integration #33

NVSHMEM4Py Integration #33

Uh oh!

Willy-Chan commented Aug 6, 2025

Uh oh!

Willy-Chan commented Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

NVSHMEM4Py Integration #33

Are you sure you want to change the base?

NVSHMEM4Py Integration #33

Uh oh!

Conversation

Willy-Chan commented Aug 6, 2025

Uh oh!

Willy-Chan commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Willy-Chan commented Aug 7, 2025 •

edited

Loading