Skip to content

[Feature]: AutoDeploy: better device configurationย #8371

@lucaslie

Description

@lucaslie

๐Ÿš€ The feature, motivation and pitch

Right now, we rely on the device attribute in the CachedSequenceInterface to describe the "desired" device. This is somewhat circular/confusing. In other places, like SequenceInfo, device denotes the desired the actual device.

We rely on the attribute here to read the desired device across the inference optimizer pipeline. We should ideally think about a better way to handle this.

Right now, this leads to some confusion like here:
https://github.com/nv-auto-deploy/TensorRT-LLM/blob/8eb10bf1102eb15ad1e0473ddd0694c6d8d6303c/tensorrt_llm/_torch/auto_deploy/transform/library/load_weights.py#L74-L76

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy Backend

Type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions