Skip to content

The report-diff script not working properly for different types of health-reports #1334

@kuchtiak-ufal

Description

@kuchtiak-ufal

Hi,

In my case, I created first health-report with no extra parameters:
/dspace/bin/dspace health-report -e kuchtiak@ufal.mff.cuni.cz
(so all default checks 0..4 were executed in this script)
The result was stored in database with timestamp: 2026-02-12 16:08:34.830

The second report was created the other day for only specific check 1 (Item summary):
/dspace/bin/dspace health-report -e kuchtiak@ufal.mff.cuni.cz --check 1
This result was stored in database with timestamp: 2026-02-18 11:15:09.307

Some new bitstreams were created between those dates.

Then I run the report-diff script and I expected having some reasonable comparison table, at least for the Item summary:
/dspace/bin/dspace report-diff -e kuchtiak@ufal.mff.cuni.cz --from 2026-02-12 16:08:34.830 --to 2026-02-18 11:15:09.307

Unfortunately, the resulted table looked like this:

Key Changes Between Reports
===============================================================================================
| Field                      | 2026-02-12 16:08:34.830 | 2026-02-18 11:15:09.307 | Difference |
===============================================================================================
| Assetstore Size (bytes)    | 3771032601              | null                    | Changed    |
| Log Directory Size (bytes) | 66849188                | null                    | Changed    |
| Communities                | 2                       | null                    | Changed    |
| Collections                | 2                       | null                    | Changed    |
| Total Content Size         | 3 GB                    | null                    | Changed    |
| Items                      | 2577                    | null                    | Changed    |
| Published Items            | 2543                    | null                    | Changed    |
| Unpublished Items          | 34                      | null                    | Changed    |
| Withdrawn Items            | 0                       | null                    | Changed    |
| Workspace Items            | 26                      | null                    | Changed    |
| Workflow Items             | 5                       | null                    | Changed    |
| Bitstreams                 | 534                     | null                    | Changed    |
| Bundles                    | 77                      | null                    | Changed    |
| Orphaned Bitstreams        | 0                       | null                    | Changed    |
| Deleted Bitstreams         | 137                     | null                    | Changed    |
| Metadata Values            | 85819                   | null                    | Changed    |
| Handles                    | 2594                    | null                    | Changed    |
| Users                      | 7                       | null                    | Changed    |
| Groups                     | 7                       | null                    | Changed    |
| Self Registered Users      | 0                       | null                    | Changed    |
| Subscribers                |                         | null                    | Changed    |
| Subscribed Collections     |                         | null                    | Changed    |
| Empty Groups               |                         | null                    | Changed    |
| Licenses                   |                         | null                    | Changed    |
===============================================================================================

I think, the problem is that the first report contains 5 checks elements, and the second report only one.

And, there's a report-diff-fields.json file with fieldMappings with hard-coded positions of fields in the report:

  "fieldMappings": {
    "/checks/0/report/directoryStats/0/size_bytes": "Assetstore Size (bytes)",
    "/checks/0/report/directoryStats/1/size_bytes": "Log Directory Size (bytes)",
    "/checks/1/report/communitiesCount": "Communities",
    "/checks/1/report/collectionsCount": "Collections",
    "/checks/1/report/collectionsSizesInfo/totalSize": "Total Content Size",
    "/checks/1/report/itemsCount": "Items",
    "/checks/1/report/publishedItems": "Published Items",
    "/checks/1/report/notPublishedItems": "Unpublished Items",
    "/checks/1/report/withdrawnItems": "Withdrawn Items",
    "/checks/1/report/workspaceItemsCount": "Workspace Items",
    "/checks/1/report/waitingForApproval": "Workflow Items",
    "/checks/1/report/bitstreamsCount": "Bitstreams",
    "/checks/1/report/bundlesCount": "Bundles",
    "/checks/1/report/collectionsSizesInfo/orphanBitstreamsCount": "Orphaned Bitstreams",
    "/checks/1/report/collectionsSizesInfo/deletedBitstreams": "Deleted Bitstreams",
    "/checks/1/report/metadataValuesCount": "Metadata Values",
    "/checks/1/report/handlesCount": "Handles",
    "/checks/1/report/ePersonsCount": "Users",
    "/checks/1/report/groupsCount": "Groups",
    "/checks/2/report/selfRegistered": "Self Registered Users",
    "/checks/2/report/subscribers": "Subscribers",
    "/checks/2/report/subscribedCollections": "Subscribed Collections",
    "/checks/2/report/emptyGroups": "Empty Groups",
    "/checks/3/report/licenses": "Licenses"
  },

That is not good. The report-diff only works for "full" health-reports. (Health-reports with all checks).

Moreover, the healthcheck.cfg file allows user to specify the own set of default "checks".

See healthcheck.cfg file:

# You can configure which module should be used during healthcheck.
# Names must match plugin.named below.
# If you use the Pre-DSpace-3.0 embargo feature, you might want to
# add 'Embargo items (Pre-3.0),' to the following list.
healthcheck.checks = General Information,\
    Item summary,\
    User summary,\
    License summary,\
    Embargo check

plugin.named.org.dspace.health.Check = \
    org.dspace.health.InfoCheck =                     General Information,\
    org.dspace.health.ChecksumCheck =                 Checksum,\
    org.dspace.health.EmbargoCheck =                  Embargo items (Pre-3.0),\
    org.dspace.health.EmbargoInfoCheck =              Embargo check,\
    org.dspace.health.ItemCheck =                     Item summary,\
    org.dspace.health.UserCheck =                     User summary,\
    org.dspace.health.LogAnalyserCheck =              Log Analyser Check,\
    org.dspace.health.LicenseCheck =                  License summary

So I think, the report-diff-fields.json should be written more generic, and the field paths shouldn't be hard-coded.

Something like:

  "fieldMappings": {
    "/checks/[name="General Information"]/report/directoryStats/0/size_bytes": "Assetstore Size (bytes)",
    "/checks/[name="General Information"]/report/directoryStats/1/size_bytes": "Log Directory Size (bytes)",
    "/checks/[name="Item summary"]/report/communitiesCount": "Communities",
    "/checks/[name="Item summary"]/report/collectionsCount": "Collections",
    ...

Or (another option) the JSON structure of the report should be changed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions