-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Search.scan
was the previous method to utilize the Scroll API via elasticsearch-dsl. However, the Scroll API has, for a lot of functionality, been deprecated in favor of the search_after
approach. To facilitate this, elasticsearch-dsl
has a Search.iterate
method which handles the default pagination for the user, default in that, you can't set the page size.
Now, suppose you have a Search
object, you can set the index via Search(index='some-index')
or with Search().index('some-index')
. Regardless, you have a Search
object on an index, for which you can then iterate the documents in said index.
for x in Search(index='some-index').iterate():
pass # do thing
However, this does not behave as I expect it to. In iterate
,
def iterate(self, keep_alive: str = "1m") -> Iterator[_R]: |
a point-in-time (PIT) is opened up, which makes sense to avoid the data changing under you.
However, my issues lies within the point_in_time
method.
def point_in_time(self, keep_alive: str = "1m") -> Iterator[Self]: |
It opens the point in time with the appropriate index, however, next, notice how it takes the self
, i.e. the current Search
object and clears the index. It then yields this search object. This might make sense in the situation the doc string describes where you are constructing a point in time for multiple queries, e.g.
with s.point_in_time() as neo_s:
neo_s.index('a').execute()
neo_s.index('b').execute()
however, in the context of iterate
, this yields issues as, index is never set again. Thus, each /search
query done by iterate will be against all indices, which could yield issues if the user doesn't have permissions to read from all indices.
Please correct me if I'm wrong, this is just what seemed to be the issue when I tried to iterate on an index with a user with fixed read permissions.
As an aside, since search_after utilizes the response values of the last hit in a query (per https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after )
I am a bit confused as to why
s = s.search_after() |
is using the
s
object from the context manager, as opposed to r
from the response. I.e., why it isn't r.search_after()
perdef search_after(self) -> "SearchBase[_R]": |