- Added Arm Neon support. (@yuygfgg)
- Added sample_mode=6. (#13)
- Added sample_mode=7. (#13)
- Changed cache size to use only the required size.
- sample_mode=5: made the output of scalar and SSE code identical.
- Fixed CPU instructions detection when MSVC is used.
- Added AVX2 and AVX512 code for sample_mode=5.
- Changed
rangelimit. (#24)