I was told to `pip install flash-attn-3 --index https://download.pytorch.org/whl/cu129/ --force-reinstall --no-deps` This wheel has a different API than the FA3 in Coreweave's image. It also worked with torch.compile.