In the Flash-attention repo here, there is now a note that the fused CUDA op has been replaced with a Triton op.
in light of that, is it now reasonable to remove from the dependencies section of this readme the suggestion to pip install the layer norm op?