You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The input of HSTU are inherently varied-length sequences (tensor), and thus dp batches can have different memory and compute resources demand. We need a way to balance the workload that takes both compute and memory resource into account.