I am working with the NVFP4 data type.
From documentation, I understand that NVFP4 used both per-tensor-scaling and per-block-scaling, and that scaling values must be stored in memory for the hardware to use them.
However, I could not find any clear API or example showing how to actually set tensor-scaling values in memory before launching a GEMM kernel.
What is the correct way to store per-tensor-scaling values in memory ?
An example or reference would be very helpfull.
Thank you!