-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Description
I am seeing a problem where if I call get_collections directly it is very slow and only the first page of the collections are returned. If I reimplement the get_collections function it runs almost instantaneously and returns the entire list of collections.
Here is a minimal example:
This takes almost 30 seconds to run and returns only the first page of collections (10 in total).
from pystac_client import Client
STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
catalog = Client.open(f'{STAC_URL}/OB_DAAC')
for c in catalog.get_collections():
print(c)
This is the re-implementation which runs as expected:
from pystac_client import Client, CollectionClient
from pystac_client._utils import Modifiable, call_modifier
STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
catalog = Client.open(f'{STAC_URL}/OB_DAAC')
def get_collections(catalog):
url = f"{catalog.get_self_href()}/collections"
for page in catalog._stac_io.get_pages(url):
if "collections" not in page:
continue
for col in page["collections"]:
collection = CollectionClient.from_dict(
col, root=catalog, modifier=catalog.modifier
)
call_modifier(catalog.modifier, collection)
yield collection
for col in get_collections(catalog):
print(col)
This implementation is basically copy pasted from the code, so I am not sure why calling it directly on the class doesn't provide the same performance or output.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request