Is it necessary to use all the data shown in the figure below for training and testing?, The dataset size is 359GB and it is so large. 