It would be great if we could have an option to select how many documents we want to used for extraction. For example, if there are 1000 files indexed in Solr, we might only want to extract from 50 files.
@r-clancy has some previous work on sampling for Solr but haven't merged yet: https://github.com/dstlry/dstlr/tree/sample-solr