-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi,
In my case, I created first health-report with no extra parameters:
/dspace/bin/dspace health-report -e kuchtiak@ufal.mff.cuni.cz
(so all default checks 0..4 were executed in this script)
The result was stored in database with timestamp: 2026-02-12 16:08:34.830
The second report was created the other day for only specific check 1 (Item summary):
/dspace/bin/dspace health-report -e kuchtiak@ufal.mff.cuni.cz --check 1
This result was stored in database with timestamp: 2026-02-18 11:15:09.307
Some new bitstreams were created between those dates.
Then I run the report-diff script and I expected having some reasonable comparison table, at least for the Item summary:
/dspace/bin/dspace report-diff -e kuchtiak@ufal.mff.cuni.cz --from 2026-02-12 16:08:34.830 --to 2026-02-18 11:15:09.307
Unfortunately, the resulted table looked like this:
Key Changes Between Reports
===============================================================================================
| Field | 2026-02-12 16:08:34.830 | 2026-02-18 11:15:09.307 | Difference |
===============================================================================================
| Assetstore Size (bytes) | 3771032601 | null | Changed |
| Log Directory Size (bytes) | 66849188 | null | Changed |
| Communities | 2 | null | Changed |
| Collections | 2 | null | Changed |
| Total Content Size | 3 GB | null | Changed |
| Items | 2577 | null | Changed |
| Published Items | 2543 | null | Changed |
| Unpublished Items | 34 | null | Changed |
| Withdrawn Items | 0 | null | Changed |
| Workspace Items | 26 | null | Changed |
| Workflow Items | 5 | null | Changed |
| Bitstreams | 534 | null | Changed |
| Bundles | 77 | null | Changed |
| Orphaned Bitstreams | 0 | null | Changed |
| Deleted Bitstreams | 137 | null | Changed |
| Metadata Values | 85819 | null | Changed |
| Handles | 2594 | null | Changed |
| Users | 7 | null | Changed |
| Groups | 7 | null | Changed |
| Self Registered Users | 0 | null | Changed |
| Subscribers | | null | Changed |
| Subscribed Collections | | null | Changed |
| Empty Groups | | null | Changed |
| Licenses | | null | Changed |
===============================================================================================
I think, the problem is that the first report contains 5 checks elements, and the second report only one.
And, there's a report-diff-fields.json file with fieldMappings with hard-coded positions of fields in the report:
"fieldMappings": {
"/checks/0/report/directoryStats/0/size_bytes": "Assetstore Size (bytes)",
"/checks/0/report/directoryStats/1/size_bytes": "Log Directory Size (bytes)",
"/checks/1/report/communitiesCount": "Communities",
"/checks/1/report/collectionsCount": "Collections",
"/checks/1/report/collectionsSizesInfo/totalSize": "Total Content Size",
"/checks/1/report/itemsCount": "Items",
"/checks/1/report/publishedItems": "Published Items",
"/checks/1/report/notPublishedItems": "Unpublished Items",
"/checks/1/report/withdrawnItems": "Withdrawn Items",
"/checks/1/report/workspaceItemsCount": "Workspace Items",
"/checks/1/report/waitingForApproval": "Workflow Items",
"/checks/1/report/bitstreamsCount": "Bitstreams",
"/checks/1/report/bundlesCount": "Bundles",
"/checks/1/report/collectionsSizesInfo/orphanBitstreamsCount": "Orphaned Bitstreams",
"/checks/1/report/collectionsSizesInfo/deletedBitstreams": "Deleted Bitstreams",
"/checks/1/report/metadataValuesCount": "Metadata Values",
"/checks/1/report/handlesCount": "Handles",
"/checks/1/report/ePersonsCount": "Users",
"/checks/1/report/groupsCount": "Groups",
"/checks/2/report/selfRegistered": "Self Registered Users",
"/checks/2/report/subscribers": "Subscribers",
"/checks/2/report/subscribedCollections": "Subscribed Collections",
"/checks/2/report/emptyGroups": "Empty Groups",
"/checks/3/report/licenses": "Licenses"
},
That is not good. The report-diff only works for "full" health-reports. (Health-reports with all checks).
Moreover, the healthcheck.cfg file allows user to specify the own set of default "checks".
See healthcheck.cfg file:
# You can configure which module should be used during healthcheck.
# Names must match plugin.named below.
# If you use the Pre-DSpace-3.0 embargo feature, you might want to
# add 'Embargo items (Pre-3.0),' to the following list.
healthcheck.checks = General Information,\
Item summary,\
User summary,\
License summary,\
Embargo check
plugin.named.org.dspace.health.Check = \
org.dspace.health.InfoCheck = General Information,\
org.dspace.health.ChecksumCheck = Checksum,\
org.dspace.health.EmbargoCheck = Embargo items (Pre-3.0),\
org.dspace.health.EmbargoInfoCheck = Embargo check,\
org.dspace.health.ItemCheck = Item summary,\
org.dspace.health.UserCheck = User summary,\
org.dspace.health.LogAnalyserCheck = Log Analyser Check,\
org.dspace.health.LicenseCheck = License summary
So I think, the report-diff-fields.json should be written more generic, and the field paths shouldn't be hard-coded.
Something like:
"fieldMappings": {
"/checks/[name="General Information"]/report/directoryStats/0/size_bytes": "Assetstore Size (bytes)",
"/checks/[name="General Information"]/report/directoryStats/1/size_bytes": "Log Directory Size (bytes)",
"/checks/[name="Item summary"]/report/communitiesCount": "Communities",
"/checks/[name="Item summary"]/report/collectionsCount": "Collections",
...
Or (another option) the JSON structure of the report should be changed.