-
-
Notifications
You must be signed in to change notification settings - Fork 504
Description
Describe the bug
Using Harvesting GN 4.0 at GeoNetwork 4.2.14, I setup harvesting from a instance-A to harvest from instance-B. I have 54 records on instance-B and only 53 records or only ever retrieved by Instance-A. I have ensured that all 54 records are spotless with respect to validation, groups, category, etc and are publicly available 'All'.
I'm increasingly convinced it's nothing related to records. I identify the missing records, re-harvest, it often appears, but another record is dropped. It does not appear 'deterministic', but 'random'. I have tried both UUID collision options of "Skip" and "Overwrite". I'm chasing ghosts.
Harvester reports, depending if records can look like either of these:
53 record(s) harvested in 135 seconds
3 minutes ago
privilegesAppendedOnExistingRecord: 53
total: 53
unchanged: 53
53 record(s) harvested in 132 seconds
20 hours ago
added: 1
privilegesAppendedOnExistingRecord: 52
removed: 1
total: 53
unchanged: 52
I suspect pagination off-by-one.
If I'm correct, can the harvester be reconfigured to request larger pages as a work-around? The default in the UI search settings is 30. Does the harvester use the same pagination default as the UI? Or is there an xml or json setting in the software distribution?
Or another workaround could be to use some sort of 'do not delete' records. I read in the docs such a setting should exist, but it is no available in the UI settings of the harvester. If I can configure the harvester to not delete records already harvested but are missing in the retrieval then I may be able to have it keep that extra record after harvest?
To Reproduce
.
Expected behavior
I hoped to have all 54 available valid records harvested
Screenshots
Unauthenticated on instance-B, I see my 54 records publicly available.
On harvested instance-A side:
Log file
Log files with overwrite or skip.
harvester_geonetwork40_wf_test_records_from_DEV_nicebay__20260220142843.log
harvester_geonetwork40_wf_test_records_from_DEV_nicebay__20260219183545.log
Desktop (please complete the following information):
- Browser Edge
- GeoNetwork Version 4.2.14
- Schema iso19139.ca.HNAP 4.2.14
- Server Application Tomcat 9.0.106; Java Adoptium 8u462b08; ElasticSearch 7.17.15
Additional context
.