Thank you for your impressive work.
I’m currently trying to reproduce the TOPS results reported in the SageAttention3 paper, and I followed the instructions in the README, but haven’t been able to match the performance described. My test script is modified based on your benchmark file, and I’ve confirmed that both my environment correct. Perhaps there’s an issue with my test script—if you could provide an official one, that would be even better.
I’d really appreciate your help in how to reproducing the TOPS performance numbers reported in the paper.