Although this project executes too much memory copying, it does not use the SIMD feature in the CPU. So, I created the ARM SIMD usage sample version and proved the performance enhanced. However, it does not take a generic one.
Therefore, we must abstract the memcpy or memset to support the generic SIMD regardless of the architectures.