
It seems that there is an error in the parsing of start_index and total_frame in the pretrain dataset. The start_index should actually be total_frame, and total_frame should correspond to label.

Otherwise, It will result in start_index being greater than total_frame.
Is my understanding correct?