Extended Description
A simple FMA loop operating on data that is aligned with alignas(64) is generating vmovups instructions when it could, and should, be generating vmovaps.
Since the nehalem architecture the cost of a vmovups relative to vmovaps is not as large as it once was, but it is noticeable and if the data is guaranteed to be aligned, then vmovaps should be the instruction used.
https://www.godbolt.org/z/CCcjYu