Before the PCA or PBS a scientist would complete several rounds of exploratory statistics and filtering. 1. Look at distributions of the data 2. Filter variants based on quality 3. Subset genotypes based on previous variant statistics 4. Filter samples (missingness) 5. Subset genotypes based on previous sample statistics [Reference the Breaking down the PCA PR](https://github.com/pystatgen/sgkit-requirements/pull/4)