Skip to content

Issue with get_collections #320

@system123

Description

@system123

I am seeing a problem where if I call get_collections directly it is very slow and only the first page of the collections are returned. If I reimplement the get_collections function it runs almost instantaneously and returns the entire list of collections.

Here is a minimal example:

This takes almost 30 seconds to run and returns only the first page of collections (10 in total).

from pystac_client import Client

STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
catalog = Client.open(f'{STAC_URL}/OB_DAAC')

for c in catalog.get_collections():
    print(c)

This is the re-implementation which runs as expected:

from pystac_client import Client, CollectionClient 
from pystac_client._utils import Modifiable, call_modifier 

STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
catalog = Client.open(f'{STAC_URL}/OB_DAAC')

def get_collections(catalog):
    url = f"{catalog.get_self_href()}/collections"

    for page in catalog._stac_io.get_pages(url):
        if "collections" not in page:
            continue

        for col in page["collections"]:
            collection = CollectionClient.from_dict(
                            col, root=catalog, modifier=catalog.modifier
                        )
            call_modifier(catalog.modifier, collection)
            yield collection

for col in get_collections(catalog):
    print(col)

This implementation is basically copy pasted from the code, so I am not sure why calling it directly on the class doesn't provide the same performance or output.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions