Replies: 8 comments 11 replies
-
Let's state the benefit of doing so: AFAIU we assume replication will take more time than local backup recovery. |
Beta Was this translation helpful? Give feedback.
-
|
It seems, that this will be the main RFC, which unites the replicaset and cluster backup. Currently, it's very unclear to me, what we're doing, way too many questions. Strict overview of motivation?Let's firstly figure out, why do we implement that and what do we want to achieve at the end. We must strictly describe the goals of the RFC (e.g. do users wanna see the Point-In-Time recovery or not, according to the https://jira.vk.team/browse/TNTP-2825 they do) and the guarantees, we give to users. For guarantees, we can check backup tools for other databases:
And please, include the links to the associated github and jira tickets, it's very difficult to find them now. How will the process look for the end user?We must determine, how the backup process will look like for the end user. Is he going to take the tool from SDK, configure it and start the backup? In that case the tool should automatically move the needed files to the configured servers. Then a user just calls the tool one more time and it restores the cluster from a backup? Or do we expect user to call some vshard/aeon function, that will return, which files should be copied and from which server, user manually goes to every instance, copies files to some servers and then uses tool to restore the cluster? Or is it going TCM or/and ATE? From the first glance, it loooks like we need all. API of replicaset/clusterThen we should define, how API of the replicaset/cluster will look like, this will be called by a user or our tool. Will we use Will writing the metadata (e.g. timestamp, instance info) of the backup to a file be a separate API? Or Will we have Review
It may happen in VShard, if it's done without any protections:
For that we could use already existing
It's not possible in VShard now and I'm not sure it's possible to implement that at all, while preserving the safety. If that's needed, we'll have to write careful RFC for that to investigate. |
Beta Was this translation helpful? Give feedback.
-
Improved this part, hopefully addressed all the questions.
The high level API is outside of this RFC goals. Here we only describe Tarantool API that can be used for backup/restore by backup agent. At this point it is not clear why we should add extra API to fetch instance config. It is already should be known due to config in Tarantool 3 or one can call
Thanx for suggestion! This part was abstract so added concrete steps for vshard (using |
Beta Was this translation helpful? Give feedback.
-
Is it necessary? Can't we recreate a replicaset with different UUIDs, maybe even with a different replication factor? |
Beta Was this translation helpful? Give feedback.
-
Do we actually support multimaster setups? Is it possible to configure one with Tarantool 3.0 config? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
How is it different from a
Does this info survive a node restart?
I think this info should be returned via
AFAIU this mode only differs in |
Beta Was this translation helpful? Give feedback.
-
|
Sorry, I still have a lot of questions regarding that feature)
Still, same comment as in the https://github.com/orgs/tarantool/discussions/12039. I don't think we should rely on tt team to figure this all out, if we decided to design such document. Since now it's not obvious, how the restore process will look like to the end user. Instead, we can invite them to review the document and think about the API together.
It also suspends all removal of outdated backups (I'd rather state it in the RFC) and it may be a problem. I'm afraid, that there may be users, which will go to every master in cluster and call smth like: function backup_start()
box.backup.start()
return box.backup.info()
endAfter that they'll go to every instance and copy the files and then try executing the following on all masters: function backup_end()
box.backup.end()
endBut it may fail on some of them (no connection e.g.) and then we'll get the instances, which never delete the backup files, that will sooner or later lead to incident, disk memory is not infinite. If we expect user to use the backup as follows in terms of cluster: function backup_start()
local files = box.backup.start()
-- do smth with `fio` module
box.backup.end()
endI'd propose introducing force close of backup on exit from function call, as we do it for transactions. However, it may have performance impact, which I don't really like. If we intentionally allow user to split
This one is not backward compatible, the first argument to box.backup.start(0, {type = 'full'})
-- or
box.backup.start(nil, {type = 'full'})
Did I understood u correctly, that if we had the following snap done before: - - ./00000000000000001111.snap
- ./00000000000000001111.xlog
- ./00000000000000002222.xlogAnd new xlog file appeared, then during incremental backup only it'll be returned: - ./00000000000000005555.xlogThen the question, how will the tarantool remember, what files were in the last backup. Will this info survive instance restart? If so, where it'll be persisted.
I didn't parse that sentence, could you elaborate please?
> box.backup.start()
---
---
- 1: ./00000000000000000777.xlog
2: ./00000000000000001111.snap
3: ./00000000000000001111.xlog
4: ./00000000000000002222.xlog
type: 'full'Key-value and index-based in the same table looks way too crutchy. In terms of > box.backup.info()
---
---
- files: - ./00000000000000000777.xlog
- ./00000000000000001111.snap
- ./00000000000000001111.xlog
- ./00000000000000002222.xlog
- type: 'full'
<and so on...>WDYT?
And the last question about icremental backups. Why do we want it at all? Do we have any users, which request it? It's way too complicated to implement in tarantool such thing and to client application it costs nothing: he will just send a list of files he already has to the function, will start a new backup and skip the files, which are already persisted. That's it. I'd think twice, do we want it in core) Moreover, as you already stated, we have problems with master changes.
And why don't we want to always return that info and not introduce that
Hmm, and it'll be considered successfull full backup and will affect incremental once after it? Even when user doesn't like it and didn't copy the files? That strange. Maybe we need smth like
Multi-master setup doesn't exist in productions, but the async cluster is obviously the most popular. And here's the question, does our clients take backups from all instances in every replicaset? I'm pretty sure, they don't.
I didn't understand, what is meant here and pretty sure
Will we take that info from
This is exactly what I described in the first comment, very dangerous. Nits
If it's not difficult for you, let's please numerate the parts of RFC, e.g. It's not easy to parse that font differance between chapters and not very obvious, what subsection I'm reading. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This RFC is outdated. New one is here.
Reviewers:
TOC
Changelog
v412/02/2026: Keptbox.backup.start()result backward compatible, added more info tobox.backup.info(), dropped requirement making backup from master for synchronous replicaset.v323/01/2026: Described incremental backup and introducedbox.backup.info().v204/12/2025: Described steps to backup/restore explicitly. Added technical details to make sure xlogs has only commited data in case of synchronous replicaset. Added multimaster/asynchronous master-replica replicaset backup/restore. Described backup in case of vshard. Made misc changes to improve document structure.v1: Added initial version.Links
Github issue #11729 (which holds further references also) .
Document aims
The purpose of the document is to describe how to backup and restore Tarantool at instance, replicaset and cluster level. We describe API only at instance level (vshard is exception, it is an example of cluster backup), all other steps are should done by backup agent. Also we expect
ttCLI will provide more user-friendly interface for backup/restrore on base of this RFC.Not all described below is how Tarantool currently works and rather is how we plan to make it work in terms of backup/restore.
Use cases
Backup is done to restore after all data is lost.
Other known use cases:
Backup consistency
We do not elaborate here making replicaset/cluster backup consistent in terms it represents state at some moment in global time as there is no such yet. However replica/shards has data at the "moment" of backup start. It may differ from replica to replica and from shard to shard due to network latencies, replica failures, internal events which may delay backup start (see technical details for asynchronous replicaset backup).
Scope
Tickets mention PITR (point in time recovery). Current argeement is that incremental backups will provide enough points of consistent recovery so we don't need anything more fine grained.
Single instance
Backup
When
box.backup.start()is called WAL is rotated and function returns a list of files required to restore to the current point. It is last snapshot and all WAL files after it up to rotated. Backup agent is supposed to copy the listed files. It may optimize backup storage and copy only new WAL files if there was no new snapshot since the last backup. After files are copied callbox.backup.stop().Example:
So to backup such replicaset we need next steps:
box.backup.start()on instance.The above is example is for memtx only spaces. In case of there are vinyl spaces the list of data files will include
*.vylog,*.indexand*.runfiles.Technical details
In case of vinyl the list will not include
*.indexand*.runfiles not included in*.vylog. For example, files dumped since last snapshot. We are going to use*.xlogto restore data written since last snapshot. Currently vinyl cannot dump memory during xlog recovery, which may be triggered, so restoring from such a backup is not possible currently. We are going either to support dump during recovery or require that recovery should be done without xlog and then xlog should be applied using tooling (tt).Incremental backup
In case of incremental backup we should be able to backup only difference since last full/incremental backup. In this case backup procedure is similar to what described in regular backup but backup is started by
box.backup.start(type='full')or bybox.backup.start(type='incremental')call.If one requests incremental backup when there was no previous full or incremental then full backup is done. Also data files for incremental backup are kept for limited amount time set by new configuration option
backup_gc_timeout. If incremental backup is requested after this period of time then also full backup is done.To facilitate client check whether full or incremental backup was done we add key
typeto table returned bybox.backup.info(). Like in case of starting backup it can have valuefullorincremental. If backup is not full nor incremental (regular) thentype = 'default'.Introspection
New
box.backup.info()function will return list of backup files like inbox.backup.start()and extra information. If there is no active backup thennilis returned.Example:
Here:
vclock_start- vclock of the first statement in the backup filesvclock_end- vclock of the statement in the backup filestime- Unix-time when backup was startedtype- type of the backup (default/full/incrementalcheck incremental backup section)mode- check backup of non synchronous replicaset sectionRecovery
To recover an instance one need to put all data files (listed on backup in
box.backup.start()) in working directory of instance before start if backup is full. If backup is incremental then one need to put files from full backup and all incremental backups since that up to required point.Synchronous replicaset
In this case it is enough to backup only master. We cannot have too outdated data in this case. Notion of master can be up-to-date or not. The latter case is when there is new term and new master this this term and this instance does not know it yet and consider itself a master. If master is up-to-date then it holds all the committed data up to now. If master is not up-to-date then the replicaset can hold new committed data but as master can continue to consider itself a master only for election timeout the amount of this data is limited.
In this case it enough to backup only single replica. It can be master or replica if it is a
followstate.So to backup such replicaset we need next steps:
box.info.uuidfor example).To restore such replicaset we need next steps:
Instance.More technical details on replicaset recovery from single instance backup are in #12039.
Incremental backup
Incremental backup will be not possible if backup is started on replica different from the previous incremental/full backup. Though the incremental backup will not fail - the full backup will be done.
There is another issue we need to take care due to changing replica being backed up. For example we make full backup F1 at replica A, then we make full backup F2 at replica B, then we request incremental backup I3 at replica A again. I3 is a difference since F1 which is not yet garbage collected by chance. Client probably expects that the I3 holds the difference from F2. Obviously client can tackle this situation and do not request incremental backup after changing the replica being backed up.
Technical details
Without extra precautions the xlogs can have uncommitted transactions. These transactions can be rolled back later in replicaset history but on restore they can be applied. So we may have statements after restore that never be visible in replicaset history. We can avoid that if we wait all uncommitted transactions that get into backup xlog to be committed. If they get rolled back then
box.backup.start()should raise error. There should be special error code, so the client can retry starting backup on this error as error is transient.Non synchronous replicaset
This can be multimaster replicaset and asynchronous master-replica replicaset. In both cases making backup of only a single instance from replicaset as described above can miss some data. For example, in case of multimaster the replication can be paused due to long standing conflict, so instances can have different statements. If we backup only one of the instances we miss statements from the other that are not replicated. As conflict can exist for a long period of time we can miss data in backups for this period.
So to backup such replicaset we need next steps:
box.backup.start({mode='replicaset'}), then callbox.backup.info()to get extra information for backup. It will havevclock_startandvclock_endkeys in this mode. These are vclock range of statements present in the data files of the backup. Backup agent should check that intervals of all replicas are overlapped for each vclock component. In this case there will be no rebootstrap after restore. In case the condition is not met, the backup should be restarted (box.backup.stop()/box.backup.start({mode='replicaset'})).Example:
To restore such replicaset we need next steps:
Technical details
In backup mode
'replicaset'we list all extra xlogs the other replicas need to connect without rebootstrap, besides last snapshot and xlogs after it.There still a chance that rebootstrap will be required. This can happen due to race. We make backup of instance
A, then we make backup of instanceB. Before thatBadvances gc vclock forA. So backup of instanceBcan miss some statements required forA. We can deal with that by inspecting vclock intervals present inbox.backup.info()output. We addvclock_startandvclock_endto thebox.backup.info()output inmode='replicaset'. There will be no rebootstrap if intervals of all replicas are overlapped for each vclock component. This check should be done by backup agent.Cluster
Mere backup of every replicaset in cluster without extra coordination may be inconsistent for 2 reasons.
We can take full cluster write lock during backup to exclude both cases but this way backup may impact cluster performance significantly. At the replicaset level backup is lightweight.
Another approach can handle issue 1 but not 2. We can abort/finish in progress data migrations and disable new ones before starting replicasets backup. After it is started data migrations are enabled again. This can be done fast and does not reduce cluster performance. As to issue 2 we can only rely on application in the latter approach, that the application can restore consistently by itself somehow.
vshard
In case of vshard we can use
vshard.router.map_callrw()to start backup on every shard. This way all in progress rebalancing will be finished before starting backup. vshard consists of synchronous replicasets, so we need synchronous replicaset backup (as described in section above) for every shard.So to backup vshard cluster we need next steps:
vshard.router.map_callrw()with functionbox.backup.start(). Make each shard backup as described in section for synchronous replicaset backup.To restore cluster we need next steps:
Beta Was this translation helpful? Give feedback.
All reactions