-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Description
When using SearchClientSync.browse_objects
to retrieve all records (as suggested by the documentation: https://www.algolia.com/doc/libraries/python/v4/helpers/#browse-for-records), the BrowseResponse
that is returned only contains the hits from the last page. Hits on all prior pages are discarded:
client = SearchClientSync(application_id, api_key)
records = client.browse_objects(
index_name="my_index",
aggregator=None,
browse_params=BrowseParamsObject(
query="",
attributes_to_retrieve=["some_column"],
),
)
print(len(records.hits)) # 227
print(len(records.page)) # 38
print(records.nb_hits) # 38940
print(records.nb_pages) # 39
The function that is called for every request doesn't keep the records from the previous response:
algoliasearch/search/client.py
The function _func
, which is passed to create_iterable_sync
, doesn't use the previous response _prev
, which it gets passed in retry
:
algoliasearch/http/helpers.py
The result is that only the last response from self.browse(...)
is actually returned. All other responses are discarded.
I found a workaround, where I create an "aggregator" that appends the hits from each response to a non-local list, but that doesn't seem like it should be necessary:
results = []
def agg(response) -> None:
results.extend(response.hits)
client.browse_objects(
index_name="my_index",
aggregator=agg,
browse_params=BrowseParamsObject(
query="",
attributes_to_retrieve=["some_column"],
),
)
print(len(results)) # 38940
### Client
Search
### Version
4.6.2
### Relevant log output
_No response_