Skip to content

Commit 0e6365f

Browse files
committed
Add Python examples: two-process and queue-based flow control
- nixl_api_2proc.py: Basic two-process example showing both initialize_xfer and make_prepped_xfer modes - nixl_sender_receiver.py: Advanced queue-based flow control with head/tail pointers - Utility functions for metadata exchange and memory operations - Comprehensive documentation for both examples
1 parent 4211b39 commit 0e6365f

File tree

7 files changed

+1733
-0
lines changed

7 files changed

+1733
-0
lines changed
Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# NIXL Two-Process Example
2+
3+
## Overview
4+
5+
A **basic two-process communication pattern** using NIXL demonstrating fundamental data transfer operations. This example shows both `initialize_xfer` and `make_prepped_xfer` transfer modes in a simple target-initiator pattern.
6+
7+
**Key Features:**
8+
- Two transfer modes (initialize vs. prepared)
9+
- Target-initiator pattern
10+
- Synchronous transfer completion
11+
- Reusable utility functions
12+
- Simple TCP-based metadata exchange
13+
14+
---
15+
16+
## Quick Start
17+
18+
### Usage
19+
20+
```bash
21+
# Run the example (assumes NIXL is properly installed)
22+
python3 nixl_api_2proc.py
23+
```
24+
25+
**Expected Output:**
26+
```
27+
[main] Starting TCP metadata server...
28+
[main] Starting target and initiator processes...
29+
[target] Starting target process
30+
[initiator] Starting initiator process
31+
[target] Memory registered
32+
[target] Published metadata and xfer descriptors to TCP server
33+
[target] Waiting for transfers...
34+
[initiator] Memory registered
35+
[initiator] Waiting for target metadata...
36+
[initiator] Loaded remote agent: target
37+
[initiator] Successfully retrieved target descriptors
38+
[initiator] Starting transfer 1 (READ)...
39+
[initiator] Initial transfer state: PROC
40+
[initiator] Transfer 1 done
41+
[target] Transfer 1 done
42+
[initiator] Starting transfer 2 (WRITE)...
43+
[initiator] Transfer 2 done
44+
[target] Transfer 2 done
45+
[target] Target process complete
46+
[initiator] Initiator process complete
47+
[main] βœ“ Test Complete - Both processes succeeded!
48+
```
49+
50+
---
51+
52+
## Architecture Summary
53+
54+
### Processes
55+
56+
**Target Process (lines 37-80):**
57+
- Allocates and registers 2 buffers (256 bytes each)
58+
- Publishes metadata and descriptors to TCP server
59+
- Waits for transfers to complete (polling `check_remote_xfer_done`)
60+
- Passive role - data is written to/read from its buffers
61+
62+
**Initiator Process (lines 83-183):**
63+
- Allocates and registers 2 buffers (256 bytes each)
64+
- Retrieves target's metadata and descriptors
65+
- Performs Transfer 1: READ (using `initialize_xfer`)
66+
- Performs Transfer 2: WRITE (using `make_prepped_xfer`)
67+
- Active role - initiates all transfers
68+
69+
### Transfer Modes
70+
71+
**Transfer 1 - Initialize Mode (lines 111-136):**
72+
```python
73+
xfer_handle_1 = nixl_agent2.initialize_xfer(
74+
"READ", agent2_xfer_descs, agent1_xfer_descs, remote_name, b"UUID1"
75+
)
76+
state = nixl_agent2.transfer(xfer_handle_1)
77+
# Poll for completion
78+
while nixl_agent2.check_xfer_state(xfer_handle_1) != "DONE":
79+
time.sleep(0.001)
80+
```
81+
82+
**Transfer 2 - Prepared Mode (lines 138-172):**
83+
```python
84+
local_prep_handle = nixl_agent2.prep_xfer_dlist(
85+
"NIXL_INIT_AGENT", [(addr3, buf_size, 0), (addr4, buf_size, 0)], "DRAM"
86+
)
87+
remote_prep_handle = nixl_agent2.prep_xfer_dlist(
88+
remote_name, agent1_xfer_descs, "DRAM"
89+
)
90+
xfer_handle_2 = nixl_agent2.make_prepped_xfer(
91+
"WRITE", local_prep_handle, [0, 1], remote_prep_handle, [1, 0], b"UUID2"
92+
)
93+
```
94+
95+
---
96+
97+
## Code Structure
98+
99+
### Phase 1: Setup (lines 37-57, 83-101)
100+
101+
**Target:**
102+
```python
103+
# Create agent with UCX backend
104+
agent_config = nixl_agent_config(backends=["UCX"])
105+
nixl_agent1 = nixl_agent("target", agent_config)
106+
107+
# Allocate memory (2 buffers, 256 bytes each)
108+
addr1 = nixl_utils.malloc_passthru(buf_size * 2)
109+
addr2 = addr1 + buf_size
110+
111+
# Create descriptors (4-tuple for registration, 3-tuple for transfer)
112+
agent1_reg_descs = nixl_agent1.get_reg_descs(agent1_strings, "DRAM")
113+
agent1_xfer_descs = nixl_agent1.get_xfer_descs(agent1_addrs, "DRAM")
114+
115+
# Register with NIXL
116+
nixl_agent1.register_memory(agent1_reg_descs)
117+
```
118+
119+
**Initiator:**
120+
```python
121+
# Create agent (uses default config)
122+
nixl_agent2 = nixl_agent("initiator", None)
123+
# Similar allocation and registration...
124+
```
125+
126+
### Phase 2: Metadata Exchange (lines 59-62, 103-109)
127+
128+
```python
129+
# Target: Publish
130+
publish_agent_metadata(nixl_agent1, "target_meta")
131+
publish_descriptors(nixl_agent1, agent1_xfer_descs, "target_descs")
132+
133+
# Initiator: Retrieve
134+
remote_name = retrieve_agent_metadata(nixl_agent2, "target_meta",
135+
timeout=10.0, role_name="initiator")
136+
agent1_xfer_descs = retrieve_descriptors(nixl_agent2, "target_descs")
137+
```
138+
139+
### Phase 3: Transfers (lines 111-172)
140+
141+
**Transfer 1 - Simple approach:**
142+
- Use `initialize_xfer()` for one-time transfers
143+
- Simpler API, creates transfer on-the-fly
144+
- Good for occasional transfers
145+
146+
**Transfer 2 - Optimized approach:**
147+
- Use `prep_xfer_dlist()` + `make_prepped_xfer()`
148+
- Pre-creates reusable transfer handles
149+
- Better for repeated transfers
150+
151+
### Phase 4: Synchronization (lines 64-75)
152+
153+
**Target waits for completion:**
154+
```python
155+
while not nixl_agent1.check_remote_xfer_done("initiator", b"UUID1"):
156+
time.sleep(0.001)
157+
```
158+
159+
**Initiator polls transfer state:**
160+
```python
161+
while nixl_agent2.check_xfer_state(xfer_handle_1) != "DONE":
162+
time.sleep(0.001)
163+
```
164+
165+
---
166+
167+
## Utility Functions
168+
169+
### From `nixl_metadata_utils.py`
170+
171+
- **`publish_agent_metadata(agent, key)`** - Publish agent metadata to TCP server
172+
- **`retrieve_agent_metadata(agent, key, timeout=10.0, role_name)`** - Retrieve remote agent (customizable timeout)
173+
- **`publish_descriptors(agent, xfer_descs, key)`** - Publish serialized descriptors
174+
- **`retrieve_descriptors(agent, key)`** - Retrieve and deserialize descriptors
175+
176+
### From `tcp_server.py`
177+
178+
- Simple key-value store for metadata exchange
179+
- Only used during setup phase
180+
- Not involved in actual data transfers
181+
182+
---
183+
184+
## Key NIXL Concepts
185+
186+
1. **Memory Registration**: `agent.register_memory(reg_descs)` before transfers
187+
2. **Agent Metadata**: Exchange via `get_agent_metadata()` and `add_remote_agent()`
188+
3. **Descriptor Types**:
189+
- **Registration descriptors**: 4-tuple `(addr, size, dev_id, name)` for memory registration
190+
- **Transfer descriptors**: 3-tuple `(addr, size, dev_id)` for actual transfers
191+
4. **Transfer Modes**:
192+
- **Initialize mode**: `initialize_xfer()` - simple, one-shot
193+
- **Prepared mode**: `prep_xfer_dlist()` + `make_prepped_xfer()` - optimized, reusable
194+
5. **Transfer Operations**:
195+
- **READ**: Initiator reads from target's memory
196+
- **WRITE**: Initiator writes to target's memory
197+
6. **Synchronization**:
198+
- **Local**: `check_xfer_state()` - check local transfer status
199+
- **Remote**: `check_remote_xfer_done()` - check if remote agent completed transfer
200+
201+
---
202+
203+
## Comparison: Initialize vs. Prepared Transfer
204+
205+
| Aspect | Initialize Mode | Prepared Mode |
206+
|--------|----------------|---------------|
207+
| **API** | `initialize_xfer()` | `prep_xfer_dlist()` + `make_prepped_xfer()` |
208+
| **Setup** | Simple, one call | More complex, two-step |
209+
| **Reusability** | One-time use | Reusable handles |
210+
| **Performance** | Good for occasional | Better for repeated |
211+
| **Use Case** | Simple transfers | High-frequency transfers |
212+
| **Example** | Transfer 1 (line 113) | Transfer 2 (line 150) |
213+
214+
---
215+
216+
## References
217+
218+
- **Advanced Example**: `nixl_sender_receiver.py` - Queue-based flow control
219+
- **Utility Functions**: `nixl_metadata_utils.py`, `nixl_memory_utils.py`
220+
- **NIXL Examples**: `nixl_api_example.py`
221+
222+
---
223+
224+
## Detailed Architecture Diagrams
225+
226+
### Setup and Metadata Exchange
227+
228+
```
229+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
230+
β”‚ SETUP PHASE β”‚
231+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
232+
233+
TCP Metadata Server Target Process Initiator Process
234+
(Port 9998) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
235+
β”‚ β”‚ β”‚ β”‚ β”‚
236+
β”‚ β”‚ 1. Create β”‚ β”‚ 1. Create β”‚
237+
β”‚ β”‚ Agent β”‚ β”‚ Agent β”‚
238+
β”‚ β”‚ (UCX) β”‚ β”‚ (default) β”‚
239+
β”‚ β”‚ β”‚ β”‚ β”‚
240+
β”‚ β”‚ 2. Allocate β”‚ β”‚ 2. Allocate β”‚
241+
β”‚ β”‚ 2 buffers β”‚ β”‚ 2 buffers β”‚
242+
β”‚ β”‚ (256B ea) β”‚ β”‚ (256B ea) β”‚
243+
β”‚ β”‚ β”‚ β”‚ β”‚
244+
β”‚ β”‚ 3. Register β”‚ β”‚ 3. Register β”‚
245+
β”‚ β”‚ Memory β”‚ β”‚ Memory β”‚
246+
β”‚ β”‚ β”‚ β”‚ β”‚
247+
│◄────(publish)──── β”‚ β”‚ β”‚
248+
β”‚ "target_meta" β”‚ β”‚ β”‚ β”‚
249+
β”‚ + descriptors β”‚ β”‚ β”‚ β”‚
250+
β”‚ β”‚ β”‚ β”‚ β”‚
251+
β”‚ β”‚ 4. Wait for β”‚ β”‚ β”‚
252+
β”‚ β”‚ transfers β”‚ β”‚ β”‚
253+
β”‚ β”‚ β”‚ β”‚ β”‚
254+
│─(retrieve)──────────────────────────────────►│ 4. Retrieve β”‚
255+
β”‚ "target_meta" + descriptors β”‚ metadata β”‚
256+
β”‚ β”‚ β”‚ β”‚ β”‚
257+
β”‚ β”‚ β”‚ β”‚ 5. Add β”‚
258+
β”‚ β”‚ β”‚ β”‚ remote β”‚
259+
β”‚ β”‚ β”‚ β”‚ agent β”‚
260+
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
261+
β”‚ β”‚
262+
β”‚ Connection Established β”‚
263+
β”‚ β”‚
264+
(TCP server only used for metadata exchange, not for data transfers)
265+
```
266+
267+
### Transfer Flow
268+
269+
```
270+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
271+
β”‚ TRANSFER PHASE β”‚
272+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
273+
274+
Target Initiator
275+
(Passive) (Active)
276+
β”‚ β”‚
277+
β”‚ Waiting for transfers... β”‚
278+
β”‚ β”‚
279+
β”‚ β”‚
280+
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
281+
β”‚ β”‚ Transfer 1: READ β”‚
282+
β”‚ β”‚ Mode: initialize_xfer β”‚
283+
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
284+
β”‚ β”‚
285+
│◄──────────────────────RDMA READ (256B)──────────────────────
286+
β”‚ Initiator reads from target's buffer β”‚
287+
β”‚ β”‚
288+
β”‚ check_remote_xfer_done("initiator", "UUID1") β”‚
289+
β”‚ β†’ TRUE β”‚
290+
β”‚ β”‚
291+
β”‚ Transfer 1 done ───────────────────────────────────────► check_xfer_state()
292+
β”‚ β†’ DONE
293+
β”‚ β”‚
294+
β”‚ β”‚
295+
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
296+
β”‚ β”‚ Transfer 2: WRITE β”‚
297+
β”‚ β”‚ Mode: make_prepped_xfer β”‚
298+
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
299+
β”‚ β”‚
300+
│◄──────────────────────RDMA WRITE (512B)─────────────────────
301+
β”‚ Initiator writes to target's buffer β”‚
302+
β”‚ (crossing: writes buf2β†’buf1, buf1β†’buf2) β”‚
303+
β”‚ β”‚
304+
β”‚ check_remote_xfer_done("initiator", "UUID2") β”‚
305+
β”‚ β†’ TRUE β”‚
306+
β”‚ β”‚
307+
β”‚ Transfer 2 done ───────────────────────────────────────► check_xfer_state()
308+
β”‚ β†’ DONE
309+
β”‚ β”‚
310+
β”‚ Cleanup & exit ◄───────────────────────────────────────► Cleanup & exit
311+
β”‚ β”‚
312+
```
313+
314+
---
315+
316+
## License
317+
318+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
319+
SPDX-License-Identifier: Apache-2.0
320+

0 commit comments

Comments
Β (0)