When having lots of datasets (like 100.000+) the get_dataset_collection shouldn't return all at once. This will give memory issues. An approach with a cursor should work.
TODO:
- make sure we have a incremental id (like uuid7) so we can sort on dataset_id to maintain order while slicing
- Add parameter to get_dataset_collection to use pagination