Skip to content

Enable post-RHT amax estimation #2578

@negvet

Description

@negvet

Is your feature request related to a problem? Please describe.

RHT+amax kernel prevents fusion.

Describe the solution you'd like

Estimate post rht amax from pre rht amax with a linear function. This eliminates rht+amax kernel. Make this feature optional. Make hyperparameters (amax estimation scale) to be tunable.

Validation

Validate lm loss with dense/moe models. Ensure the convergence is the same or better.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions