We're currently reading in all sample chunks for simplicity, but this is quite wasteful and could be much more efficient for subsets.