Replies: 7 comments 8 replies
-
Let's state the benefit of doing so: AFAIU we assume replication will take more time than local backup recovery. |
Beta Was this translation helpful? Give feedback.
-
|
It seems, that this will be the main RFC, which unites the replicaset and cluster backup. Currently, it's very unclear to me, what we're doing, way too many questions. Strict overview of motivation?Let's firstly figure out, why do we implement that and what do we want to achieve at the end. We must strictly describe the goals of the RFC (e.g. do users wanna see the Point-In-Time recovery or not, according to the https://jira.vk.team/browse/TNTP-2825 they do) and the guarantees, we give to users. For guarantees, we can check backup tools for other databases:
And please, include the links to the associated github and jira tickets, it's very difficult to find them now. How will the process look for the end user?We must determine, how the backup process will look like for the end user. Is he going to take the tool from SDK, configure it and start the backup? In that case the tool should automatically move the needed files to the configured servers. Then a user just calls the tool one more time and it restores the cluster from a backup? Or do we expect user to call some vshard/aeon function, that will return, which files should be copied and from which server, user manually goes to every instance, copies files to some servers and then uses tool to restore the cluster? Or is it going TCM or/and ATE? From the first glance, it loooks like we need all. API of replicaset/clusterThen we should define, how API of the replicaset/cluster will look like, this will be called by a user or our tool. Will we use Will writing the metadata (e.g. timestamp, instance info) of the backup to a file be a separate API? Or Will we have Review
It may happen in VShard, if it's done without any protections:
For that we could use already existing
It's not possible in VShard now and I'm not sure it's possible to implement that at all, while preserving the safety. If that's needed, we'll have to write careful RFC for that to investigate. |
Beta Was this translation helpful? Give feedback.
-
Improved this part, hopefully addressed all the questions.
The high level API is outside of this RFC goals. Here we only describe Tarantool API that can be used for backup/restore by backup agent. At this point it is not clear why we should add extra API to fetch instance config. It is already should be known due to config in Tarantool 3 or one can call
Thanx for suggestion! This part was abstract so added concrete steps for vshard (using |
Beta Was this translation helpful? Give feedback.
-
Is it necessary? Can't we recreate a replicaset with different UUIDs, maybe even with a different replication factor? |
Beta Was this translation helpful? Give feedback.
-
Do we actually support multimaster setups? Is it possible to configure one with Tarantool 3.0 config? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
How is it different from a
Does this info survive a node restart?
I think this info should be returned via
AFAIU this mode only differs in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reviewers:
TOC
Changelog
v323/01/2026: Described incremental backup and introducedbox.backup.info().v204/12/2025: Described steps to backup/restore explicitly. Added technical details to make sure xlogs has only commited data in case of synchronous replicaset. Added multimaster/asynchronous master-replica replicaset backup/restore. Described backup in case of vshard. Made misc changes to improve document structure.v1: Added initial version.Links
Github issue #11729 (which holds further references also) .
Document aims
The purpose of the document is to describe how to backup and restore Tarantool at instance, replicaset and cluster level. We describe API only at instance level (vshard is exception, it is an example of cluster backup), all other steps are should done by backup agent. Also we expect
ttCLI will provide more user-friendly interface for backup/restrore on base of this RFC.Not all described below is how Tarantool currently works and rather is how we plan to make it work in terms of backup/restore.
Use cases
Backup is done to restore after all data is lost.
Other known use cases:
Backup consistency
We do not elaborate here making replicaset/cluster backup consistent in terms it represents state at some moment in global time as there is no such yet. However replica/shards has data at the "moment" of backup start. It may differ from replica to replica and from shard to shard due to network latencies, replica failures, internal events which may delay backup start (see technical details for asynchronous replicaset backup).
Scope
Tickets mention PITR (point in time recovery). Current argeement is that incremental backups will provide enough points of consistent recovery so we don't need anything more fine grained.
Single instance
Backup
When
box.backup.start()is called WAL is rotated and function returns a list of files required to restore to the current point. It is last snapshot and all WAL files after it up to rotated. Backup agent is supposed to copy the listed files. It may optimize backup storage and copy only new WAL files if there was no new snapshot since the last backup. After files are copied callbox.backup.stop().Example:
So to backup such replicaset we need next steps:
box.backup.start()on instance.The above is example is for memtx only spaces. In case of there are vinyl spaces the list of data files will include
*.vylog,*.indexand*.runfiles.Incremental backup
In case of incremental backup we should be able to backup only difference since last full/incremental backup. In this case backup procedure is similar to what described in regular backup but backup is started by
box.backup.start(type='full')or bybox.backup.start(type='incremental')call.If one requests incremental backup when there was no previous full or incremental then full backup is done. Also data files for incremental backup are kept for limited amount time set by new configuration option
backup_base_gc_timeout. If incremental backup is requested after this period of time then also full backup is done.To facilitate client check whether full or incremental backup was done we add key
typeto table returned bybox.backup.start(). Like in case of starting backup it can have valuefullorincremental. If backup is not full nor incremental (regular) thentype = 'default'.Introspection
New
box.backup.info()function will return the same information asbox.backup.start()if there is active backup ornilotherwise.Recovery
To recover an instance one need to put all data files (listed on backup in
box.backup.start()) in working directory of instance before start.Synchronous replicaset
In this case it is enough to backup only master. We cannot have too outdated data in this case. Notion of master can be up-to-date or not. The latter case is when there is new term and new master this this term and this instance does not know it yet and consider itself a master. If master is up-to-date then it holds all the committed data up to now. If master is not up-to-date then the replicaset can hold new committed data but as master can continue to consider itself a master only for election timeout the amount of this data is limited.
So to backup such replicaset we need next steps:
box.info.uuidfor example).To restore such replicaset we need next steps:
Instance.More technical details on replicaset recovery from single instance backup are in #12039.
Incremental backup
Incremental backup will be not possible if current master is different from the master at the time the previous incremental/full backup was done. Though the incremental backup will not fail - the full backup will be done.
There is another issue we need to take care due to changing masters. For example we make full backup F1 at replica A, then master changed and we make full backup F2 and replica B, then again A is master, we request incremental backup I3 at replica A. I3 is a difference since F1 which is not yet garbage collected by chance. Client probably expects that the I3 holds the difference from F2. To facilitate client handle this case we add
vlock_startandvclock_endto the result ofbox.backup.start(). These are vclock range of statements present in the data files of the backup. So I3 will start from where F1 ended, not F2, so client can link incremental backup properly or request to start full backup instead.Technical details
Without extra precautions the xlogs can have uncommitted transactions. These transactions can be rolled back later in replicaset history but on restore they can be applied. So we may have statements after restore that never be visible in replicaset history. We can avoid that if we wait all uncommitted transactions that get into backup xlog to be committed. If they get rolled back then
box.backup.start()should raise error. There should be special error code, so the client can retry starting backup on this error as error is transient.Non synchronous replicaset
This can be multimaster replicaset and asynchronous master-replica replicaset. In both cases making backup of only a single instance from replicaset as described above can miss some data. For example, in case of multimaster the replication can be paused due to long standing conflict, so instances can have different statements. If we backup only one of the instances we miss statements from the other that are not replicated. As conflict can exist for a long period of time we can miss data in backups for this period.
So to backup such replicaset we need next steps:
box.backup.start({mode='replicaset'}). Backup start will returnvclock_startandvclock_endin this mode. These are vclock range of statements present in the data files of the backup. Backup agent should check that intervals of all replicas are overlapped for each vclock component. In this case there will be no rebootstrap after restore. In case the condition is not met, the backup should be restarted (box.backup.stop()/box.backup.start({mode='replicaset'})).Example:
To restore such replicaset we need next steps:
Technical details
In backup mode
'replicaset'we list all extra xlogs the other replicas need to connect without rebootstrap, besides last snapshot and xlogs after it.There still a chance that rebootstrap will be required. This can happen due to race. We make backup of instance
A, then we make backup of instanceB. Before thatBadvances gc vclock forA. So backup of instanceBcan miss some statements required forA. We can deal with that by inspecting vclock intervals present inbox.backup.start()output. We addvclock_startandvclock_endto thebox.backup.start()output inmode='replicaset'. There will be no rebootstrap if intervals of all replicas are overlapped for each vclock component. This check should be done by backup agent.Cluster
Mere backup of every replicaset in cluster without extra coordination may be inconsistent for 2 reasons.
We can take full cluster write lock during backup to exclude both cases but this way backup may impact cluster performance significantly. At the replicaset level backup is lightweight.
Another approach can handle issue 1 but not 2. We can abort/finish in progress data migrations and disable new ones before starting replicasets backup. After it is started data migrations are enabled again. This can be done fast and does not reduce cluster performance. As to issue 2 we can only rely on application in the latter approach, that the application can restore consistently by itself somehow.
vshard
In case of vshard we can use
vshard.router.map_callrw()to start backup on every shard. This way all in progress rebalancing will be finished before starting backup. vshard consists of synchronous replicasets, so we need synchronous replicaset backup (as described in section above) for every shard.So to backup vshard cluster we need next steps:
vshard.router.map_callrw()with function ``box.backup.start()`. Make each shard backup as described in section for synchronous replicaset backup.To restore cluster we need next steps:
Beta Was this translation helpful? Give feedback.
All reactions