-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Labels
Description
Paging through the CPAN releases API no longer works for results greater than 10,000
Code location: https://github.com/librariesio/libraries.io/blob/master/app/models/package_manager/cpan.rb#L17
Example url:
https://fastapi.metacpan.org/v1/release/_search?fields=distribution&from=10000&q=status%3Alatest&size=5000&sort=date%3Adesc
Error:
{
"message": "[Request] ** [http://127.0.0.1:9200]-[500] {\"error\":{\"root_cause\":[{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.\"}],\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[{\"shard\":0,\"index\":\"cpan_v1_01\",\"node\":\"euEoqisPSk68CnedNAzoZA\",\"reason\":{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.\"}}]},\"status\":500}, called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/metacpan/metacpan-api/lib/MetaCPAN/Server/Controller.pm line 125. With vars: {'request' => {'method' => 'GET','ignore' => [],'path' => '/cpan/release/_search','serialize' => 'std','qs' => {'q' => 'status:latest','fields' => 'distribution','sort' => 'date:desc','size' => 5000,'from' => 10000},'body' => undef},'status_code' => 500}\n"
}
The docs suggest using the scroll
api: https://github.com/metacpan/metacpan-api/blob/master/docs/API-docs.md#being-polite but the links to the docs are dead.
More recent scroll api docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html but I couldn't seem to get it to accept scroll_id
as a parameter:
{
"message": "[Param] ** Unknown param (scroll_id) in (search) request. See docs at: http://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html, called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/metacpan/metacpan-api/lib/MetaCPAN/Server/Controller.pm line 125."
}