Would you be so kind as to provide an example of using this implementation of MetalFlashAttention compiled as a library as a replacement for the original flash_attn with Python code running a model such as https://huggingface.co/microsoft/Phi-4-multimodal-instruct?
And thanks for your article!
Thank you!