Improve logging for harvester UUID collisions#9188
Conversation
| log.info(String.format("UUID collision detected for record with uuid '%s'. Record already exists in the catalogue but does not belong to this harvester (%s).", | ||
| ri.uuid, params.getName())); | ||
|
|
||
| switch (params.getOverrideUuid()) { |
There was a problem hiding this comment.
I'm not sure about the change in the log level here from debug to info. It's a harvester parameter params.getOverrideUuid(), logging this information in every metadata seems redundant and, in large catalogues, will generate a lot of log entries.
Please check similar change in the other files.
There was a problem hiding this comment.
I’ve updated the PR so only SKIP collision logs are changed to info. Would this be more acceptable?
OVERRIDE and RANDOM are expected to handle duplicates, so I left those unchanged.
In our case we use SKIP because we cannot overwrite or generate new UUIDs. The issue is that collisions are currently silent, which makes it hard to understand why some records were not harvested.
b22dbf7 to
9e84664
Compare
jodygarnett
left a comment
There was a problem hiding this comment.
Thanks @tylerjmchugh feedback addressed, and the change to log skipped records makes sense as INFO (rather than quiet, or WARNING).
Currently when a UUID collision occurs during harvesting, it is handled internally according to the UUID merge policy (skip, overwrite, etc.), but without any explicit log entry.
As a result, records may be skipped, replaced, or otherwise handled without any visibility in the logs, making it difficult to understand why certain records were not created or were modified.
This PR aims to fix this issue by consistently logging info messages whenever a UUID collision occurs. This makes collision-related behavior easier to trace and debug.
Checklist
mainbranch, backports managed with labelREADME.mdfilespom.xmldependency management. Update build documentation with intended library use and library tutorials or documentation