Skip to content
206 changes: 205 additions & 1 deletion modules/ROOT/pages/backup-restore/online-backup.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ Note: this is an EXPERIMENTAL option. Consult Neo4j support before use.
|false

|--prefer-diff-as-parent
|label:new[Introduced in 2025.04] When performing a differential backup, prefer the latest non-empty differential backup as the parent instead of the latest full backup.
|label:new[Introduced in 2025.04] When performing a differential backup, prefer the latest non-empty differential backup as the parent instead of the latest backup.
|false

|--temp-path=<path>
Expand Down Expand Up @@ -459,3 +459,207 @@ bin/neo4j-admin database backup --to-path=azb://myStorageAccount/myContainer/myD
----
======
=====


[role=label--new-2025.04]
[[diff-backup-as-parent]]
=== Perform a differential backup using the `--prefer-diff-as-parent` option

When taking a differential backup with the `--type=DIFF` option, the parent, by default, is the *most recent non-empty* backup in the directory.
In some cases, you may prefer to use the *latest differential* backup as the parent, even if there is a more recent full backup.
You can do this by setting the option `--prefer-diff-as-parent` to `True`.

This can be used to ensure you have differential backups for all transactions, which would allow you to restore to any point in time.
Otherwise, the transactions between a full backup and the previous differential backup will not be backed up as individual transactions.

The examples below cover different scenarios for using the `--prefer-diff-as-parent` option.

[.tabbed-example]
=====
[role=include-with-Chain-with-full-and-differential-backups]
======

Let's assume that every hour you write 10 transactions to the `neo4j` database, except from 12:30-13:30, when you do not write any transaction.

There is a backup job that takes a backup every hour and a full backup every four hours.
We refer as an _empty_ backup a backup that has no transactions, meaning that both the lower transaction ID and the upper transaction ID are zero.

Imagine you have the following backup chain:

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 1
| 10

| 11:30
| backup2
| DIFF
| 11
| 20

| 12:30
| backup3
| DIFF
| 21
| 30

| 13:30
| backup4
| DIFF
| 0
| 0

| 14:30
| backup5
| FULL
| 1
| 40

|===

At 15:30, you execute the following backup command:

[source,shell]
----
neo4j-admin database backup --from=<address:port> --to-path=<targetPath> --type=DIFF neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 14:30
| backup6
| DIFF
| 41
| 50
|===

The result means you have chosen `backup5` as the parent for your differential `backup6` since the `backup5` is the *latest non-empty* backup.

However, if you execute the following command with the `--prefer-diff-as-parent` option:

[source,shell]
----
neo4j-admin database backup --from=<address:port> --to-path=<targetPath> --type=DIFF --prefer-diff-as-parent neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 14:30
| backup6
| DIFF
| 31
| 50
|===

In this case, the `backup3` is selected as the parent since it is the *latest non-empty differential* backup.

======
[role=include-with-Chain-with-only-full-backups]
======

Let's assume that you write 10 transactions to the `neo4j` database every hour and trigger an hourly full backup.

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 1
| 10

| 11:30
| backup2
| FULL
| 11
| 20
|===

In this case, there is no differential backup.
Therefore, the `--prefer-diff-as-parent` option has no effect and the behaviour is the same as the default one.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=DIFF --prefer-diff-as-parent \
neo4j
----

The result would be (with or without the `--prefer-diff-as-parent` option):
[cols="h,e,m,h,h"]
|===
| 12:30
| backup3
| DIFF
| 21
| 30
|===

======
[role=include-with-Chain-with-only-empty-full-backups]
======

Let's assume that the database is empty and you do not write anything to it, while still taking hourly full backups.

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 0
| 0

| 11:30
| backup2
| FULL
| 0
| 0
|===

In this case, you cannot perform a differential backup with the `--type=DIFF` option, and the below command fails anyway, whether you used the `--prefer-diff-as-parent` or not.
This occurs because we are looking for the *latest non-empty* backup, and there are only empty backups.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=DIFF --prefer-diff-as-parent \
neo4j
----

But if you select the `--type=AUTO` option, the command will succeed, and the result would be another empty *full* backup.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=AUTO --prefer-diff-as-parent \
neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 12:30
| backup3
| FULL
| 0
| 0
|===

======
=====