Skip to content
208 changes: 207 additions & 1 deletion modules/ROOT/pages/backup-restore/online-backup.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ Note: this is an EXPERIMENTAL option. Consult Neo4j support before use.
|false

|--prefer-diff-as-parent
|label:new[Introduced in 2025.04] When performing a differential backup, prefer the latest non-empty differential backup as the parent instead of the latest full backup.
|label:new[Introduced in 2025.04] When performing a differential backup, prefer the latest non-empty differential backup as the parent instead of the latest backup.
|false

|--temp-path=<path>
Expand Down Expand Up @@ -459,3 +459,209 @@ bin/neo4j-admin database backup --to-path=azb://myStorageAccount/myContainer/myD
----
======
=====


[role=label--new-2025.04]
[[diff-backup-as-parent]]
=== Perform a differential backup using the `--prefer-diff-as-parent` option

By default, a differential backup (`--type=DIFF`) uses the *most recent non-empty* backup -- whether full or differential -- in the directory as its parent.

The `--prefer-diff-as-parent` option changes this behavior and allows you to use the *latest differential* backup as its parent, even if a newer full backup exists.

This apprroach allows you to maintain a chain of differential backups for all transactions and restore to any point in time.
Without this option, the transactions between the last full backup and a previous differential backup cannot be backed up as individual transactions.

To use the `--prefer-diff-as-parent` option, set it to `true`.

The following examples cover different scenarios for using the `--prefer-diff-as-parent` option.

[.tabbed-example]
=====
[role=include-with-Chain-with-full-and-differential-backups]
======

Let's assume that you write 10 transactions to the `neo4j` database every hour, except from 12:30 to 13:30, when you do not write any transactions.

There is a backup job that takes a backup every hour and a full backup every four hours.
An empty backup has no transactions, meaning that both the lower transaction ID and the upper transaction ID are zero.

Imagine you have the following backup chain:

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 1
| 10

| 11:30
| backup2
| DIFF
| 11
| 20

| 12:30
| backup3
| DIFF
| 21
| 30

| 13:30
| backup4
| DIFF
| 0
| 0

| 14:30
| backup5
| FULL
| 1
| 40

|===

At 15:30, you execute the following backup command:

[source,shell]
----
neo4j-admin database backup --from=<address:port> --to-path=<targetPath> --type=DIFF neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 14:30
| backup6
| DIFF
| 41
| 50
|===

The result means you have chosen `backup5` as the parent for your differential `backup6` since the `backup5` is the *latest non-empty* backup.

However, if you execute the following command with the `--prefer-diff-as-parent` option:

[source,shell]
----
neo4j-admin database backup --from=<address:port> --to-path=<targetPath> --type=DIFF --prefer-diff-as-parent neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 14:30
| backup6
| DIFF
| 31
| 50
|===

In this case, the `backup3` is selected as the parent since it is the *latest non-empty differential* backup.

======
[role=include-with-Chain-with-only-full-backups]
======

Let's assume that you write 10 transactions to the `neo4j` database every hour and trigger an hourly full backup.

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 1
| 10

| 11:30
| backup2
| FULL
| 11
| 20
|===

In this case, there is no differential backup.
Therefore, the `--prefer-diff-as-parent` option has no effect and the behaviour is the same as the default one.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=DIFF --prefer-diff-as-parent \
neo4j
----

The result would be (with or without the `--prefer-diff-as-parent` option):
[cols="h,e,m,h,h"]
|===
| 12:30
| backup3
| DIFF
| 21
| 30
|===

======
[role=include-with-Chain-with-only-empty-full-backups]
======

Let's assume that the database is empty and you do not write anything to it, while still taking hourly full backups.

[cols="h,e,m,h,h"]
|===
|Timestamp | Backup name | Backup type | Lower Transaction ID | Upper Transaction ID

| 10:30
| backup1
| FULL
| 0
| 0

| 11:30
| backup2
| FULL
| 0
| 0
|===

In this case, you cannot perform a differential backup with the `--type=DIFF` option, and the below command will fail anyway, whether you use the `--prefer-diff-as-parent` or not.
This occurs because you are looking for the *latest non-empty* backup, and there are only empty backups.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=DIFF --prefer-diff-as-parent \
neo4j
----

But if you select the `--type=AUTO` option, the command will succeed, and the result will be another empty *full* backup.

[source,shell]
----
neo4j-admin database backup \
--from=<address:port> --to-path=<targetPath> \
--type=AUTO --prefer-diff-as-parent \
neo4j
----

The result would be:

[cols="h,e,m,h,h"]
|===
| 12:30
| backup3
| FULL
| 0
| 0
|===

======
=====