We could interpret `*_thickness.npy` numpy array as a grayscale image, and use ITK image comparison filter to regress to known good output.