Skip to content

[gdi-userportal-dataset-discovery-service] Remove CKAN max rows limitationΒ #104

@admy7

Description

@admy7

🎯 What? (Story Description)

  • Find a way to retrieve all the records from package_search

  • Apply it in CkanDatasetsIdCollector to retrieve all the dataset Ids from CKAN

πŸ’‘ Why? (Justification)

In this service, we provide to the user the possibility to search datasets, eventually using different sources (e.g. CKAN, Beacon).

Internally, the first step is to find the ids of the datasets which match the user query, for each data source separately.
Then, we reconcile the different id sets by taking the intersection of those.
Consequently, the final intersection can be smaller than what the initial query was looking for.

In this regard, we would like to retrieve ALL the dataset ids for every data source, before the merge, to reduce as much as possible the chances of getting less records than inquired.
CKAN is problematic because it limits us to 1000 records maximum (by default).

πŸ”¨ Tasks (Breakdown)

  • Find a way to retrieve all the records from package_search

  • Apply it in CkanDatasetsIdCollector to retrieve all the dataset ids from CKAN

βœ… Acceptance Criteria

Can we retrieve all the dataset ids from CKAN?

βž• Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions