[Feature]:  AutoDeploy: better device configuration

### 🚀 The feature, motivation and pitch

Right now, we rely on the `device` attribute in the `CachedSequenceInterface` to describe the "desired" device. This is somewhat circular/confusing. In other places, like SequenceInfo, `device` denotes the desired the actual device.

We rely on the attribute here to read the desired device across the inference optimizer pipeline. We should ideally think about a better way to handle this.

Right now, this leads to some confusion like here:
https://github.com/nv-auto-deploy/TensorRT-LLM/blob/8eb10bf1102eb15ad1e0473ddd0694c6d8d6303c/tensorrt_llm/_torch/auto_deploy/transform/library/load_weights.py#L74-L76

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: AutoDeploy: better device configuration #8371

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: AutoDeploy: better device configuration #8371

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions