Skip to content

[bug]: SearchClient.browse_objects only returns last page #573

@connesy

Description

@connesy

Description

When using SearchClientSync.browse_objects to retrieve all records (as suggested by the documentation: https://www.algolia.com/doc/libraries/python/v4/helpers/#browse-for-records), the BrowseResponse that is returned only contains the hits from the last page. Hits on all prior pages are discarded:

client = SearchClientSync(application_id, api_key)
records = client.browse_objects(
    index_name="my_index",
    aggregator=None,
    browse_params=BrowseParamsObject(
        query="",
        attributes_to_retrieve=["some_column"],
    ),
)

print(len(records.hits))  # 227
print(len(records.page))  # 38
print(records.nb_hits)    # 38940
print(records.nb_pages)   # 39

The function that is called for every request doesn't keep the records from the previous response:
algoliasearch/search/client.py
image

The function _func, which is passed to create_iterable_sync, doesn't use the previous response _prev, which it gets passed in retry:
algoliasearch/http/helpers.py
image

The result is that only the last response from self.browse(...) is actually returned. All other responses are discarded.

I found a workaround, where I create an "aggregator" that appends the hits from each response to a non-local list, but that doesn't seem like it should be necessary:

results = []
def agg(response) -> None:
	results.extend(response.hits)

client.browse_objects(
    index_name="my_index",
    aggregator=agg,
    browse_params=BrowseParamsObject(
        query="",
        attributes_to_retrieve=["some_column"],
    ),
)

print(len(results))  # 38940

### Client

Search

### Version

4.6.2

### Relevant log output

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions