-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Description
๐ The feature, motivation and pitch
Right now, we rely on the device attribute in the CachedSequenceInterface to describe the "desired" device. This is somewhat circular/confusing. In other places, like SequenceInfo, device denotes the desired the actual device.
We rely on the attribute here to read the desired device across the inference optimizer pipeline. We should ideally think about a better way to handle this.
Right now, this leads to some confusion like here:
https://github.com/nv-auto-deploy/TensorRT-LLM/blob/8eb10bf1102eb15ad1e0473ddd0694c6d8d6303c/tensorrt_llm/_torch/auto_deploy/transform/library/load_weights.py#L74-L76
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Type
Projects
Status
Backlog