Skip to content

[FEA] Balanced jagged compute&memory among DP ranks #207

@JacoCheung

Description

@JacoCheung

The input of HSTU are inherently varied-length sequences (tensor), and thus dp batches can have different memory and compute resources demand. We need a way to balance the workload that takes both compute and memory resource into account.

Metadata

Metadata

Assignees

Labels

enhancementImprovement for existing feature

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions