Hi,
I wonder if the kernels for aarch64 architecture use NEON instructions? The assembly code in https://github.com/google/ruy/blob/master/ruy/kernel_arm64.cc doesn't have NEON instructions like VADD or VMUL. How is vectorization performed for 64-bit arm architectures?