You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: wg-data-protection/data-protection-workflows-white-paper.md
+76-76Lines changed: 76 additions & 76 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,80 +5,81 @@
5
5
This document answers the following questions: why do we need data protection in Kubernetes, what is currently available in Kubernetes, what functionalities are missing in Kubernetes to support data protection? We will describe how to identify resources for data protection, what is the volume backup and restore workflow, and what is the application snapshot, backup, and restore workflow.
-[Workflow for restore PVC-snapshot](#workflow-for-restore-pvc-snapshot)
57
58
-[Restore To New](#restore-to-new)
58
59
-[Restore to Production (partial restore)](#restore-to-production-partial-restore)
59
60
-[Workflow for restore logical-dump](#workflow-for-restore-logical-dump)
60
61
-[Restore to New](#restore-to-new-1)
61
62
-[Restore to Production](#restore-to-production)
62
-
-[Appendix](#appendix)
63
-
-[Backup and Restore of Different Databases](#backup-and-restore-of-different-databases)
64
-
-[Relational](#relational)
65
-
-[Mysql](#mysql)
66
-
-[Time series](#time-series)
67
-
-[NuoDB](#nuodb)
68
-
-[Prometheus](#prometheus)
69
-
-[InfluxDB](#influxdb)
70
-
-[Key value store](#key-value-store)
71
-
-[etcd](#etcd)
72
-
-[Message queues](#message-queues)
73
-
-[Kafka](#kafka)
74
-
-[Distributed databases](#distributed-databases)
75
-
-[Mongo](#mongo)
76
-
-[References](#references)
63
+
-[Appendix](#appendix)
64
+
-[Backup and Restore of Different Databases](#backup-and-restore-of-different-databases)
65
+
-[Relational](#relational)
66
+
-[Mysql](#mysql)
67
+
-[Time series](#time-series)
68
+
-[NuoDB](#nuodb)
69
+
-[Prometheus](#prometheus)
70
+
-[InfluxDB](#influxdb)
71
+
-[Key value store](#key-value-store)
72
+
-[etcd](#etcd)
73
+
-[Message queues](#message-queues)
74
+
-[Kafka](#kafka)
75
+
-[Distributed databases](#distributed-databases)
76
+
-[Mongo](#mongo)
77
+
-[References](#references)
77
78
<!-- /toc -->
78
79
79
80
## Data Protection Definition
80
81
81
-
The **_Data Protection_** term in the Kubernetes context is the process of protecting valuable data and configs of applications running in a Kubernetes cluster. The result of the data protection process is typically called as a backup. When unexpected scenarios occur, for example data corruption by a malfunctioning software, data loss during a disaster, such a backup can be used to restore the protected workload to the states preserved in the backup.
82
+
The **Data Protection** term in the Kubernetes context is the process of protecting valuable data and configs of applications running in a Kubernetes cluster. The result of the data protection process is typically called as a backup. When unexpected scenarios occur, for example data corruption by a malfunctioning software, data loss during a disaster, such a backup can be used to restore the protected workload to the states preserved in the backup.
82
83
83
84
In Kubernetes, a stateful application contains two primarily pieces of data:
84
85
@@ -385,7 +386,7 @@ For brevity's sake, `snapshot` will be used to mean `volume snapshot` and `backu
385
386
386
387
8. It might be desirable to try and standardize some common attributes of backups (e.g., object storage buckets, regions where backups are stored, number of copies of each backup, etc.). However, zeal for pursuing a deep level of such standardization should be tempered by the desire to open up a marketplace of competitive offerings that allow for a healthy degree of freedom for innovation and opportunities for competitive discrimination.
387
388
388
-
### CBT
389
+
### Change Block Tracking
389
390
390
391
#### Motivation
391
392
@@ -649,15 +650,14 @@ Proposes a Kubernetes API for stateful application data management that supports
649
650
650
651
This KEP (https://github.com/kubernetes/enhancements/pull/1051) proposes a Kubernetes Stateful Application Data Management API consisting of a set of [CustomResourceDefinitions (CRD)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that collectively define the notion of stateful applications, i.e., applications that maintain persistent state, and a set of data management semantics on stateful applications such as snapshot, backup, restoration, and clone. A snapshot of a stateful application is defined as a point-in-time capture of the state of the application, taken in an application-consistent manner. It captures both the application configurations (definitions of Kubernetes resources that make up the application, e.g., StatefulSets, Services, ConfigMaps, Secrets, etc.) and persistent data contained within the application (via persistent volumes).
651
652
652
-
#
653
-
Application Backup and Restore workflow
653
+
# Application Backup and Restore workflows
654
654
655
-
Since there are 2 general methods of backup and restore applications: logical-dump operation and PVC-snapshot, the backup and restore also have types of 2 workflows:
655
+
Since there are 2 general methods of backup and restore applications: logical-dump operation and PVC-snapshot, the backup and restore also have 2 types of workflows:
656
656
657
657
* Logical dump workflows for backup and restore
658
658
* PVC-snapshot workflow for backup and restore
659
659
660
-
### Backup Application workflows
660
+
##Application Backup workflows
661
661
662
662
The application backup workflows are facilitated by a Data Protection Controller which listens to Backup object creation on the Kubernetes API Server and executes a backup workflow to backup an application. The workflow may involve a component called a “data-mover” pod which can connect to backup devices (backup repositories) to backup an application’s persistent volume data.
663
663
@@ -678,13 +678,13 @@ An application backup workflow may involve the following steps:
678
678
* Controller then deletes the data mover pod and the snapshot.
679
679
4. Application level action such as execute command to enable load balance after the backup data step has been done regardless of whether the backup succeeds or not. This step is only needed if step 2 is executed.
680
680
681
-
### Restore Application workflows
681
+
##Application Restore workflows
682
682
683
683
Restore application in general does not require quiesce and unquiesce. One potential scenario in which quiesce might need to take place is for applications using ReadWriteMany PVCs(discussed below). The restore workflows are executed by the Data Protection Controller and it involves a component called a “data-mover” pod which can connect to backup devices (backup repositories) to retrieve metadata and data of the backup images. This data-mover pod will have to be created on-demand to be able to access PVCs to restore data.
684
684
685
-
####Workflow for restore PVC-snapshot
685
+
### Workflow for restore PVC-snapshot
686
686
687
-
#####Restore To New
687
+
#### Restore To New
688
688
689
689
The general application restore to new process is done as following
690
690
@@ -696,7 +696,7 @@ The general application restore to new process is done as following
696
696
* Controller then terminates the data-mover pod thus unmounts all PVCs.
697
697
* Then restore pods, statefulsets, services etc. which use those PVCs being restored in previous steps.
698
698
699
-
#####Restore to Production (partial restore)
699
+
#### Restore to Production (partial restore)
700
700
701
701
However, when we restore to an existing/production environment where the application is running, if the application uses PVCs with access mode ReadWriteOne (most common), we cannot directly restore data to those PVCs. The step would be as followings:
702
702
@@ -709,17 +709,17 @@ However, when we restore to an existing/production environment where the applica
709
709
* Controller then terminate the data-move pod, effectively umount all PVCs
710
710
* Finally the controller scales up the application deployment to the original replica. (Skip this step if PVCs are ReadWriteMany).
711
711
712
-
####Workflow for restore logical-dump
712
+
### Workflow for restore logical-dump
713
713
714
714
This workflow is much simpler and it can be done by the controller itself because the logical dump operations can be done via network connection to the application server pod without the need to access the application PVCs. Most applications that support logical-dump operation also support the same operation in reverse direction. The backup images are generally local files or file streams from remote backup devices.
715
715
716
-
#####Restore to New
716
+
#### Restore to New
717
717
718
718
* First, controller restores the namespace, cluster resources etc. will be used by the application
719
719
* Controller then restores application pods/deployments.
720
720
* Then the controller runs a database/application client with these local files/file streams as input and executes the logical-dump operation in the reserve direction back to the database/application.
721
721
722
-
#####Restore to Production
722
+
#### Restore to Production
723
723
724
724
* Controller simply runs the logical-dump operation in the reserve direction as mentioned above.
0 commit comments