[QST] How to configure per-tensor-scaling for NVFP4

I am working with the NVFP4 data type.
From documentation, I understand that NVFP4 used both per-tensor-scaling and per-block-scaling, and that scaling values must be stored in memory for the hardware to use them.

However, I could not find any clear API or example showing how to actually set tensor-scaling values in memory before launching a GEMM kernel.

What is the correct way to store per-tensor-scaling values in memory ?

An example or reference would be very helpfull.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] How to configure per-tensor-scaling for NVFP4 #2642

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] How to configure per-tensor-scaling for NVFP4 #2642

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions