Prometheus monitoring

Why

When using NPBackup client, you may want to get metrics for prometheus.
NPBackup has two ways of creating metrics:

metrics file: Used on servers with node_exporter installed
push gateway: Used on clients without node_exporter

Metrics file

In the configuration, add a file path to the destination field in the global_prometheus section. Example:

destination: /var/lib/node_exporter/textfile_collector/npbackup.prom

On every NPBackup run, the above file will be created with prometheus metrics.
These files can be picked up by node_exporter if it has the textfile collector configured via argument --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

Push gateway

In the configuration, add an URI to your Prometheus Push Gateway, in the following form

https://push.mydomain.tld/metrics/job/${BACKUP_JOB}

The variable ${BACKUP_JOB} is populated from the prometheus section of a repo or group, and defaults to the ${MACHINE_ID} variable which comes from the identity section. Of course, you can override any of those variables with whatever you want.

Note: Using https is out of the scope of this wiki. Usually, this is done by using a https proxy like Haproxy.

Produced metrics

NPBackup parses restic output to create the following metrics when using backup function:

restic_files{instance="",backup_job="",state="",action="backup"}: Number of files added, changed or unmodified
- States: new, changed, unmodified, and total
'restic_dirs{instance="",backup_job="",state="",action="backup"}`: Number of directories added, changed or unmodified
- States: new, changed, unmodified
restic_snasphot_size_bytes{instance="",backup_job="",action="backup",type="processed"}: Total data volume in bytes
restic_total_duration_seconds{instance="",backup_job="",action="backup"}: Backup duration in seconds
restic_data_added{instance="",backup_job="",action="backup"}: Data volume added in bytes

Additionally, NPBackup creates the following metrics itself for every run action:

npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="",backup_job="",action="",repo_name="",timestamp=""}
- Metric value is the execution state
  - 0: Ok
  - 1: Warnings
  - 2: Errros
  - 3: Critical error
npbackup_exec_time{action="",repo_name="",timestamp=""}
- Metric value the execution time in seconds

Valid actions are init, backup, has_recent_snapshot, snapshots, stats, ls, find, restore, dump, check, recover, list, unlock, repair, forget, housekeeping, prune, raw, and upgrade

Additional labels

The configuration allows to add trivial labels to prometheus metrics.
The following example:

repos:
  default:
    monitoring:
        backup_job: myjob
        instance: ${MACHINE_ID}
        group: ${MACHINE_GROUP}
        additional_labels:
        - host_type: hypervisor
        - backup_type: baremetal
global_prometheus:
  enabled: true

Will lead to the creation of metrics that look like:

npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="myjob",host_type="hypervisor",backup_type="baremetal",action="upgrade",repo_name="default",timestamp="1736882285"} 0
npbackup_exec_time{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="myjob",host_type="hypervisor",backup_type="baremetal",action="snapshots",repo_name="default",timestamp="1736882285"} 0.0

Grafana Dashboard

There is an example Grafana dashboard in examples directory, that has been tested with Grafana v10+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus monitoring

Why

Metrics file

Push gateway

Produced metrics

Additional labels

Grafana Dashboard

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally