Skip to content

Prometheus monitoring

Orsiris de Jong edited this page Mar 10, 2026 · 8 revisions

Why

When using NPBackup client, you may want to get metrics for prometheus.
NPBackup has two ways of creating metrics:

  • metrics file: Used on servers with node_exporter installed
  • push gateway: Used on clients without node_exporter

Metrics file

In the configuration, add a file path to the destination field in the global_prometheus section. Example:

destination: /var/lib/node_exporter/textfile_collector/npbackup.prom

On every NPBackup run, the above file will be created with prometheus metrics.
These files can be picked up by node_exporter if it has the textfile collector configured via argument --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

Push gateway

In the configuration, add an URI to your Prometheus Push Gateway, in the following form

https://push.mydomain.tld/metrics/job/${BACKUP_JOB}

The variable ${BACKUP_JOB} is populated from the prometheus section of a repo or group, and defaults to the ${MACHINE_ID} variable which comes from the identity section. Of course, you can override any of those variables with whatever you want.

Note: Using https is out of the scope of this wiki. Usually, this is done by using a https proxy like Haproxy.

Produced metrics

NPBackup parses restic output to create the following metrics when using backup function:

  • restic_files{instance="",backup_job="",state="",action="backup"}: Number of files added, changed or unmodified
    • States: new, changed, unmodified, and total
  • 'restic_dirs{instance="",backup_job="",state="",action="backup"}`: Number of directories added, changed or unmodified
    • States: new, changed, unmodified
  • restic_snasphot_size_bytes{instance="",backup_job="",action="backup",type="processed"}: Total data volume in bytes
  • restic_total_duration_seconds{instance="",backup_job="",action="backup"}: Backup duration in seconds
  • restic_data_added{instance="",backup_job="",action="backup"}: Data volume added in bytes

Additionally, NPBackup creates the following metrics itself for every run action:

  • npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="",backup_job="",action="",repo_name="",timestamp=""}
    • Metric value is the execution state
      • 0: Ok
      • 1: Warnings
      • 2: Errros
      • 3: Critical error
  • npbackup_exec_time{action="",repo_name="",timestamp=""}
    • Metric value the execution time in seconds

Valid actions are init, backup, has_recent_snapshot, snapshots, stats, ls, find, restore, dump, check, recover, list, unlock, repair, forget, housekeeping, prune, raw, and upgrade

Additional labels

The configuration allows to add trivial labels to prometheus metrics.
The following example:

repos:
  default:
    monitoring:
        backup_job: myjob
        instance: ${MACHINE_ID}
        group: ${MACHINE_GROUP}
        additional_labels:
        - host_type: hypervisor
        - backup_type: baremetal
global_prometheus:
  enabled: true

Will lead to the creation of metrics that look like:

npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="myjob",host_type="hypervisor",backup_type="baremetal",action="upgrade",repo_name="default",timestamp="1736882285"} 0
npbackup_exec_time{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="myjob",host_type="hypervisor",backup_type="baremetal",action="snapshots",repo_name="default",timestamp="1736882285"} 0.0

Grafana Dashboard

There is an example Grafana dashboard in examples directory, that has been tested with Grafana v10+.

Clone this wiki locally