just wondering if packing in this way could represent a bias in the aggregated batches (across all GPUs) when compared to simple random sampling. Ideally you want the probability of any sample being in a batch being the same as random sampling, but I can't really figure the math behind it.