MDC - updateCitationsForDataset API may end up in infinite loop #11533
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi everyone :)
Our team identified a issue with
api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=$DOIendpoint that could end up in infinite loop if Datacite result is paginated.There is 2 reasons for this :
Dataverse depends on Datacite json reponse content to got out of a while loop : https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/MakeDataCountApi.java#L172-L201
if (links.containsKey("next")) {can always be true.Datacite seems to have an issue with HATEOAS implementation as it always contains a
nextlink even when you are out of page range or on the last page. Standard recommendation is to removenextlink or set it to empty or null on last page. I'll report this issue to Datacite.Ex:
First page : https://api.datacite.org/events?doi=10.12763/SMDGR1&source=crossref&page[size]=1000
Page number 11 : https://api.datacite.org/events?doi=10.12763%2FSMDGR1&page%5Bnumber%5D=11&page%5Bsize%5D=1000
Still displays a next link :
This code aims both to work better with Datacite as is and have a default condition max iteration condition to avoid infinite loop.
Note : worst case scenario, a default max page number of 1000 x 1000 items seems enough regarding crossref...