I think the tmp variable should be moved out from the inner loop. The current implementation moves the selected element by swapping, instead of writing it once where it should be. Is it even insertion sort right now?
This needs to be tested against the benchmarks.