-
Notifications
You must be signed in to change notification settings - Fork 676
feat: decoded media via NIXL #3988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cbe3353 to
5be5da0
Compare
6895b84 to
eb75d6f
Compare
|
@milesial please fix the merge conflicts |
6a0920b to
f19f157
Compare
Yes just rebased, diff looks nicer now |
| description = "Dynamo LLM Library" | ||
|
|
||
| [features] | ||
| default = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change making block manager a default build feature intended for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because the frontend now uses their abstractions. I've heard there is some modularization coming up for that part but until then I hope it's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think that'll break things for most people not building inside the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I wait for #4012 which is enabled by default if I understand correctly?
Who is building outside the container? At some point we'll need to have things like NIXL available during build anyway if we want to have NIXL features in the frontend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with that MR, that'll remove the kvbm dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use that v2 memory crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like CI fails because it can't find NIXL :/ But locally it works
nixl.h: No such file or directory
4dbec11 to
b888111
Compare
f19f157 to
ef274b4
Compare
b888111 to
bed5bd0
Compare
ef274b4 to
f09816e
Compare
a622756 to
bb40dae
Compare
bb40dae to
1b1882c
Compare
WalkthroughThis pull request introduces RDMA and NIXL integration for media data handling. A new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/llm/src/preprocessor/media/loader.rs (1)
187-211: Cleanup: Remove duplicate test assertions.Lines 187-211 duplicate the exact same assertions from lines 169-183 (shape verification). This appears to be unintentional code duplication.
Apply this diff to remove the duplicate assertions:
mock.assert_async().await; - - assert!( - !descriptor.tensor_info.shape.is_empty(), - "Shape should not be empty" - ); - assert_eq!( - descriptor.tensor_info.shape[0], 1125, - "Height should be 1125" - ); - assert_eq!( - descriptor.tensor_info.shape[1], 1999, - "Width should be 1999" - ); - assert_eq!( - descriptor.tensor_info.shape[2], 4, - "RGBA channels should be 4" - ); - + assert!( descriptor.source_storage.is_some(), "Source storage should be present"lib/llm/src/preprocessor/media/rdma.rs (2)
54-72: Consider preserving error context in NIXL registration.The
map_erron line 58-59 replaces the original error with a generic message. Consider including more context about the storage size or type to aid debugging.let registered = nixl::register_with_nixl(source_storage, nixl_agent, None) - .map_err(|_| anyhow::anyhow!("Failed to register storage with NIXL"))?; + .map_err(|s| anyhow::anyhow!("Failed to register {} bytes of storage with NIXL", s.len()))?;
104-113: Track the NIXL metadata workaround.The function currently uses
get_local_md()as a workaround until the referenced PR #970 is merged. The_storageparameter is unused, and the commented-out code shows the intended partial metadata implementation.Consider:
- Tracking this workaround in an issue linked to NIXL PR #970
- Removing the commented-out code if it will be replaced once the PR merges
- Adding a comment about the performance impact of sending full metadata vs. partial
Do you want me to create a tracking issue for this workaround?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
Cargo.toml(1 hunks)lib/llm/Cargo.toml(3 hunks)lib/llm/src/preprocessor.rs(1 hunks)lib/llm/src/preprocessor/media.rs(1 hunks)lib/llm/src/preprocessor/media/decoders.rs(2 hunks)lib/llm/src/preprocessor/media/decoders/image.rs(7 hunks)lib/llm/src/preprocessor/media/loader.rs(6 hunks)lib/llm/src/preprocessor/media/rdma.rs(1 hunks)lib/llm/src/protocols/common/preprocessor.rs(2 hunks)lib/memory/src/nixl.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-29T19:11:14.161Z
Learnt from: nv-anants
Repo: ai-dynamo/dynamo PR: 3290
File: docs/_includes/quick_start_local.rst:13-14
Timestamp: 2025-09-29T19:11:14.161Z
Learning: ai-dynamo package requires --prerelease=allow flag for installation due to sglang dependency requiring RC versions (e.g., sglang[all]==0.5.0rc2).
Applied to files:
Cargo.tomllib/llm/Cargo.toml
📚 Learning: 2025-10-17T22:57:12.526Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager.rs:19-19
Timestamp: 2025-10-17T22:57:12.526Z
Learning: The dynamo library (ai-dynamo/dynamo repository) is currently developed to run on Linux only, so Linux-specific code and syscalls do not require platform guards.
Applied to files:
Cargo.tomllib/llm/Cargo.toml
📚 Learning: 2025-10-17T22:57:25.021Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager/numa_allocator.rs:46-64
Timestamp: 2025-10-17T22:57:25.021Z
Learning: The dynamo library (specifically the llm/block_manager module) is developed to run on Linux only, so Linux-specific system calls and APIs can be used without platform guards.
Applied to files:
Cargo.toml
📚 Learning: 2025-10-17T22:57:18.773Z
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager/numa_allocator.rs:4-6
Timestamp: 2025-10-17T22:57:18.773Z
Learning: The Dynamo library (lib/llm) is developed to run on Linux only, so platform-specific cfg guards for Linux-only features are not required.
Applied to files:
Cargo.toml
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
lib/llm/src/preprocessor.rslib/llm/src/preprocessor/media/loader.rs
📚 Learning: 2025-08-18T20:51:51.324Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.
Applied to files:
lib/llm/Cargo.toml
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
lib/llm/Cargo.toml
🧬 Code graph analysis (3)
lib/llm/src/preprocessor/media.rs (1)
lib/llm/src/preprocessor/media/rdma.rs (2)
get_nixl_agent(115-119)get_nixl_metadata(104-113)
lib/llm/src/preprocessor/media/rdma.rs (3)
lib/memory/src/nixl.rs (3)
register_with_nixl(176-203)nixl_descriptor(111-113)storage(134-136)lib/llm/src/preprocessor/media/loader.rs (1)
new(48-66)lib/memory/src/nixl/agent.rs (1)
with_backends(82-112)
lib/llm/src/preprocessor/media/loader.rs (3)
lib/llm/src/preprocessor/media/rdma.rs (1)
get_nixl_agent(115-119)lib/memory/src/nixl.rs (3)
storage(134-136)descriptor(62-62)descriptor(121-129)lib/llm/src/block_manager/storage/nixl.rs (1)
is_nixl_registered(181-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: trtllm (arm64)
🔇 Additional comments (29)
Cargo.toml (1)
50-50: LGTM! Clean workspace dependency addition.The
dynamo-memorycrate is correctly added to workspace dependencies with matching version and proper path.lib/memory/src/nixl.rs (2)
17-17: LGTM! Enables serialization support for NIXL descriptors.The public export of serde traits is appropriate for enabling serialization of
NixlDescriptoracross module boundaries.
28-28: LGTM! Serialization derivations are safe and appropriate.Adding
SerializeandDeserializetoNixlDescriptoris safe since it contains only plain data types (u64, usize, MemType, u64), and this enables passing RDMA descriptors across process or network boundaries.lib/llm/Cargo.toml (2)
45-45: LGTM! Adds required dependency for NIXL integration.The
dynamo-memoryworkspace dependency is correctly added to support the new RDMA/NIXL media handling flow.
147-147: LGTM! Enables image metadata serialization.Adding the
serdefeature to theimagecrate is appropriate for serializing image metadata in the RDMA descriptor flow.lib/llm/src/protocols/common/preprocessor.rs (2)
9-9: LGTM! Imports the RDMA descriptor type.The import correctly brings in
RdmaMediaDataDescriptorfor use in theMultimodalDataenum.
15-15: LGTM! Replaces placeholder with concrete RDMA descriptor.The
Decodedvariant now properly carriesRdmaMediaDataDescriptor, completing the RDMA-based media handling integration. This allows multimodal data to be passed with NIXL registration metadata.lib/llm/src/preprocessor/media.rs (2)
7-7: LGTM! Adds RDMA module.The new
rdmamodule cleanly separates RDMA-related functionality from the existing media handling code.
12-12: LGTM! Exposes RDMA-related public API.The public re-exports properly expose the RDMA descriptor types and NIXL helper functions for use by the preprocessor and other modules.
lib/llm/src/preprocessor/media/decoders.rs (3)
5-5: LGTM! Adds serialization support.The serde import enables serialization for the updated
DecodedMediaMetadatatype.
8-8: LGTM! Relocates DecodedMediaData to RDMA module.The import correctly references
DecodedMediaDatafrom the newrdmamodule, maintaining clean separation between decoding and RDMA registration concerns.
33-36: LGTM! Adds serialization to metadata.Making
DecodedMediaMetadatapublic withSerializeandDeserializederives enables passing metadata alongside RDMA descriptors. TheCopytrait is appropriate given the small size ofImageMetadata.lib/llm/src/preprocessor/media/loader.rs (4)
12-14: LGTM! Updated imports for RDMA integration.The imports correctly bring in RDMA-related types and the NIXL agent for media descriptor creation.
58-64: LGTM! Initializes NIXL agent for media registration.The
NixlAgentis properly initialized duringMediaLoaderconstruction and will be used to register decoded media with NIXL. Error propagation is appropriate if agent creation fails.
98-120: LGTM! Converts decoded media to RDMA descriptor.The updated flow correctly:
- Decodes the media (line 105)
- Converts to RDMA descriptor with NIXL registration (line 119)
- Returns the descriptor for downstream use
The NIXL registration ensures the decoded data can be accessed via RDMA.
44-44: Verify NixlAgent thread-safety for concurrent operations - cannot confirm from codebase alone.NixlAgent is stored directly (not Arc-wrapped) in MediaLoader, but relies on
nixl_sys::Agentfrom an external crate. The struct derivesCloneandDebugbut has no explicitSend/Syncbounds. WhileHashSet<String>itself is thread-safe, thread-safety depends entirely on whethernixl_sys::AgentimplementsSend + Sync. This cannot be verified from the repository. Please confirm:
- That
nixl_sys::Agentis documented asSend + Sync- How MediaLoader instances are shared across concurrent requests in the preprocessor service
- Whether the concurrent usage pattern matches the thread-safety guarantees
lib/llm/src/preprocessor.rs (1)
330-344: NIXL registration cleanup is properly handled via Arc-based resource management.Verified:
RdmaMediaDataDescriptorcorrectly keeps the registration alive throughArc<nixl::NixlRegistered<SystemStorage>>stored in thesource_storagefield. On error, unprocessed fetch tasks and any partial descriptors drop naturally, triggering the Arc drop chain and cleanup of the NIXL registration. The design follows proper Rust resource management patterns—no additional changes needed.lib/llm/src/preprocessor/media/decoders/image.rs (6)
9-13: LGTM! Imports correctly updated for RDMA integration.The imports now reference
DecodedMediaDatafrom the RDMA module and add Serde support for serialization.
17-27: LGTM! Serde derives enable configuration serialization.Adding
SerializeandDeserializederives allows the decoder configuration to be serialized for transport or storage.
40-53: LGTM! Metadata types correctly derive serialization traits.The added
SerializeandDeserializederives enable metadata to be transported with RDMA descriptors.
82-88: LGTM! Conversion and metadata placement correctly updated.The change from
into()totry_into()matches the newTryFromimplementation in the RDMA module, and metadata placement correctly follows the new nested structure.
94-164: LGTM! Tests correctly updated for new structure.Test assertions properly reference the nested
tensor_infostructure to verify shape and data type.
199-257: LGTM! Remaining tests consistently updated.All test assertions correctly reference the new nested structure for shape and data type verification.
lib/llm/src/preprocessor/media/rdma.rs (6)
17-20: Limited data type support is acceptable for image-focused PR.Currently only
UINT8is supported. The TODO at line 75 indicates audio support (f32) is planned for future work.
23-28: LGTM! Clean tensor metadata structure.The structure appropriately groups shape, data type, and optional metadata together.
31-35: LGTM! Appropriate local representation.
DecodedMediaDatacorrectly holds the decoded storage locally before conversion to the serializableRdmaMediaDataDescriptor.
38-52: LGTM! Well-designed RDMA descriptor with proper lifetime management.The
Arcwrapper onsource_storageensures the registered memory remains valid while the descriptor is alive, which is critical for RDMA safety.
76-99: The unsafe copy operation is memory-safe and all verification concerns are addressed.SystemStorage uses
posix_memalignwith 4KB alignment, ensuring proper alignment for the copy. The pointer is valid because the allocation is checked for success, and the storage lifetime is guaranteed since it's owned by the returnedDecodedMediaData. Thecopy_nonoverlappingcall is safe: source and destination are from separate allocations (Vec and malloc), the lengths match exactly, and no overlap exists. No changes required.
115-119: Review comment is based on incorrect assumptions about agent creation frequency.The function creates one agent per
MediaLoaderinstance, not per call.get_nixl_agent()is invoked once duringMediaLoader::new(), and the agent is stored and reused throughout the loader's lifetime. The design follows a sound pattern: single agent instance per loader, stored as a struct field, with automatic cleanup when the loader is dropped. The underlyingAgenttype (from FFI bindings) handles resource cleanup automatically. No evidence of resource leaks or inefficiency was found.
1b1882c to
07fb85a
Compare
07fb85a to
b544039
Compare
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
b544039 to
7e071ea
Compare
|
Pulled in #4198 |
Overview:
See #3630 for context, this MR is a subset of it, focusing on NIXL metadata passing, building on top of #3971.
Summary by CodeRabbit
New Features
Improvements