Skip to content

Commit 48ee31e

Browse files
committed
Update white paper
1 parent da611cd commit 48ee31e

File tree

1 file changed

+76
-76
lines changed

1 file changed

+76
-76
lines changed

wg-data-protection/data-protection-workflows-white-paper.md

Lines changed: 76 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -5,80 +5,81 @@
55
This document answers the following questions: why do we need data protection in Kubernetes, what is currently available in Kubernetes, what functionalities are missing in Kubernetes to support data protection? We will describe how to identify resources for data protection, what is the volume backup and restore workflow, and what is the application snapshot, backup, and restore workflow.
66

77
<!-- toc -->
8-
- [Data Protection Definition](#data-protection-definition)
9-
- [Why do we need Data Protection in Kubernetes?](#why-do-we-need-data-protection-in-kubernetes)
10-
- [Cloud Native Applications vs Traditional Data Protection](#cloud-native-applications-vs-traditional-data-protection)
11-
- [Application Evolution](#application-evolution)
12-
- [Legacy Technologies](#legacy-technologies)
13-
- [Stateful vs Stateless Applications](#stateful-vs-stateless-applications)
14-
- [Roles and Scopes in IT](#roles-and-scopes-in-it)
15-
- [Use Cases](#use-cases)
16-
- [User Personas in Kubernetes Data Protection](#user-personas-in-kubernetes-data-protection)
17-
- [Application Protection](#application-protection)
18-
- [Application Definition](#application-definition)
19-
- [Application Backup Definition](#application-backup-definition)
20-
- [Application Disaster Recovery](#application-disaster-recovery)
21-
- [Application Rollback](#application-rollback)
22-
- [Application Migration](#application-migration)
23-
- [Application Cloning](#application-cloning)
24-
- [Application Retrieval](#application-retrieval)
25-
- [Resource Recovery](#resource-recovery)
26-
- [Namespace Protection](#namespace-protection)
27-
- [Cluster Protection](#cluster-protection)
28-
- [What is currently available in Kubernetes?](#what-is-currently-available-in-kubernetes)
29-
- [What are the missing building blocks in Kubernetes?](#what-are-the-missing-building-blocks-in-kubernetes)
30-
- [Volume Backups](#volume-backups)
31-
- [Motivation](#motivation)
32-
- [Desirable Characteristics of Volume Backups](#desirable-characteristics-of-volume-backups)
33-
- [CBT](#cbt)
34-
- [Motivation](#motivation-1)
35-
- [Sample Backup workflow with Differential Snapshots Service](#sample-backup-workflow-with-differential-snapshots-service)
36-
- [Volume Populator](#volume-populator)
37-
- [Motivation](#motivation-2)
38-
- [Status](#status)
39-
- [Quiesce and Unquiesce Hooks](#quiesce-and-unquiesce-hooks)
40-
- [Motivation](#motivation-3)
41-
- [Background](#background)
42-
- [Container Notifier](#container-notifier)
43-
- [Volume Group and Group Snapshot](#volume-group-and-group-snapshot)
44-
- [Motivation](#motivation-4)
45-
- [Goals](#goals)
46-
- [Status](#status-1)
47-
- [Backup Repositories](#backup-repositories)
48-
- [Why do we need backup repositories](#why-do-we-need-backup-repositories)
49-
- [Motivation/Objective](#motivationobjective)
50-
- [Application Snapshots and Backups](#application-snapshots-and-backups)
51-
- [Motivation](#motivation-5)
52-
- [Goals](#goals-1)
53-
- [Status](#status-2)
54-
- [Backup Application workflows](#backup-application-workflows)
55-
- [Restore Application workflows](#restore-application-workflows)
8+
- [Data Protection Definition](#data-protection-definition)
9+
- [Why do we need Data Protection in Kubernetes?](#why-do-we-need-data-protection-in-kubernetes)
10+
- [Cloud Native Applications vs Traditional Data Protection](#cloud-native-applications-vs-traditional-data-protection)
11+
- [Application Evolution](#application-evolution)
12+
- [Legacy Technologies](#legacy-technologies)
13+
- [Stateful vs Stateless Applications](#stateful-vs-stateless-applications)
14+
- [Roles and Scopes in IT](#roles-and-scopes-in-it)
15+
- [Use Cases](#use-cases)
16+
- [User Personas in Kubernetes Data Protection](#user-personas-in-kubernetes-data-protection)
17+
- [Application Protection](#application-protection)
18+
- [Application Definition](#application-definition)
19+
- [Application Backup Definition](#application-backup-definition)
20+
- [Application Disaster Recovery](#application-disaster-recovery)
21+
- [Application Rollback](#application-rollback)
22+
- [Application Migration](#application-migration)
23+
- [Application Cloning](#application-cloning)
24+
- [Application Retrieval](#application-retrieval)
25+
- [Resource Recovery](#resource-recovery)
26+
- [Namespace Protection](#namespace-protection)
27+
- [Cluster Protection](#cluster-protection)
28+
- [What is currently available in Kubernetes?](#what-is-currently-available-in-kubernetes)
29+
- [What are the missing building blocks in Kubernetes?](#what-are-the-missing-building-blocks-in-kubernetes)
30+
- [Volume Backups](#volume-backups)
31+
- [Motivation](#motivation)
32+
- [Desirable Characteristics of Volume Backups](#desirable-characteristics-of-volume-backups)
33+
- [Change Block Tracking](#change-block-tracking)
34+
- [Motivation](#motivation-1)
35+
- [Sample Backup workflow with Differential Snapshots Service](#sample-backup-workflow-with-differential-snapshots-service)
36+
- [Volume Populator](#volume-populator)
37+
- [Motivation](#motivation-2)
38+
- [Status](#status)
39+
- [Quiesce and Unquiesce Hooks](#quiesce-and-unquiesce-hooks)
40+
- [Motivation](#motivation-3)
41+
- [Background](#background)
42+
- [Container Notifier](#container-notifier)
43+
- [Volume Group and Group Snapshot](#volume-group-and-group-snapshot)
44+
- [Motivation](#motivation-4)
45+
- [Goals](#goals)
46+
- [Status](#status-1)
47+
- [Backup Repositories](#backup-repositories)
48+
- [Why do we need backup repositories](#why-do-we-need-backup-repositories)
49+
- [Motivation/Objective](#motivationobjective)
50+
- [Application Snapshots and Backups](#application-snapshots-and-backups)
51+
- [Motivation](#motivation-5)
52+
- [Goals](#goals-1)
53+
- [Status](#status-2)
54+
- [Application Backup and Restore workflows](#application-backup-and-restore-workflows)
55+
- [Application Backup workflows](#application-backup-workflows)
56+
- [Application Restore workflows](#application-restore-workflows)
5657
- [Workflow for restore PVC-snapshot](#workflow-for-restore-pvc-snapshot)
5758
- [Restore To New](#restore-to-new)
5859
- [Restore to Production (partial restore)](#restore-to-production-partial-restore)
5960
- [Workflow for restore logical-dump](#workflow-for-restore-logical-dump)
6061
- [Restore to New](#restore-to-new-1)
6162
- [Restore to Production](#restore-to-production)
62-
- [Appendix](#appendix)
63-
- [Backup and Restore of Different Databases](#backup-and-restore-of-different-databases)
64-
- [Relational](#relational)
65-
- [Mysql](#mysql)
66-
- [Time series](#time-series)
67-
- [NuoDB](#nuodb)
68-
- [Prometheus](#prometheus)
69-
- [InfluxDB](#influxdb)
70-
- [Key value store](#key-value-store)
71-
- [etcd](#etcd)
72-
- [Message queues](#message-queues)
73-
- [Kafka](#kafka)
74-
- [Distributed databases](#distributed-databases)
75-
- [Mongo](#mongo)
76-
- [References](#references)
63+
- [Appendix](#appendix)
64+
- [Backup and Restore of Different Databases](#backup-and-restore-of-different-databases)
65+
- [Relational](#relational)
66+
- [Mysql](#mysql)
67+
- [Time series](#time-series)
68+
- [NuoDB](#nuodb)
69+
- [Prometheus](#prometheus)
70+
- [InfluxDB](#influxdb)
71+
- [Key value store](#key-value-store)
72+
- [etcd](#etcd)
73+
- [Message queues](#message-queues)
74+
- [Kafka](#kafka)
75+
- [Distributed databases](#distributed-databases)
76+
- [Mongo](#mongo)
77+
- [References](#references)
7778
<!-- /toc -->
7879

7980
## Data Protection Definition
8081

81-
The **_ Data Protection_** term in the Kubernetes context is the process of protecting valuable data and configs of applications running in a Kubernetes cluster. The result of the data protection process is typically called as a backup. When unexpected scenarios occur, for example data corruption by a malfunctioning software, data loss during a disaster, such a backup can be used to restore the protected workload to the states preserved in the backup.
82+
The **Data Protection** term in the Kubernetes context is the process of protecting valuable data and configs of applications running in a Kubernetes cluster. The result of the data protection process is typically called as a backup. When unexpected scenarios occur, for example data corruption by a malfunctioning software, data loss during a disaster, such a backup can be used to restore the protected workload to the states preserved in the backup.
8283

8384
In Kubernetes, a stateful application contains two primarily pieces of data:
8485

@@ -385,7 +386,7 @@ For brevity's sake, `snapshot` will be used to mean `volume snapshot` and `backu
385386

386387
8. It might be desirable to try and standardize some common attributes of backups (e.g., object storage buckets, regions where backups are stored, number of copies of each backup, etc.). However, zeal for pursuing a deep level of such standardization should be tempered by the desire to open up a marketplace of competitive offerings that allow for a healthy degree of freedom for innovation and opportunities for competitive discrimination.
387388

388-
### CBT
389+
### Change Block Tracking
389390

390391
#### Motivation
391392

@@ -649,15 +650,14 @@ Proposes a Kubernetes API for stateful application data management that supports
649650

650651
This KEP (https://github.com/kubernetes/enhancements/pull/1051) proposes a Kubernetes Stateful Application Data Management API consisting of a set of [CustomResourceDefinitions (CRD)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that collectively define the notion of stateful applications, i.e., applications that maintain persistent state, and a set of data management semantics on stateful applications such as snapshot, backup, restoration, and clone. A snapshot of a stateful application is defined as a point-in-time capture of the state of the application, taken in an application-consistent manner. It captures both the application configurations (definitions of Kubernetes resources that make up the application, e.g., StatefulSets, Services, ConfigMaps, Secrets, etc.) and persistent data contained within the application (via persistent volumes).
651652

652-
#
653-
Application Backup and Restore workflow
653+
# Application Backup and Restore workflows
654654

655-
Since there are 2 general methods of backup and restore applications: logical-dump operation and PVC-snapshot, the backup and restore also have types of 2 workflows:
655+
Since there are 2 general methods of backup and restore applications: logical-dump operation and PVC-snapshot, the backup and restore also have 2 types of workflows:
656656

657657
* Logical dump workflows for backup and restore
658658
* PVC-snapshot workflow for backup and restore
659659

660-
### Backup Application workflows
660+
## Application Backup workflows
661661

662662
The application backup workflows are facilitated by a Data Protection Controller which listens to Backup object creation on the Kubernetes API Server and executes a backup workflow to backup an application. The workflow may involve a component called a “data-mover” pod which can connect to backup devices (backup repositories) to backup an application’s persistent volume data.
663663

@@ -678,13 +678,13 @@ An application backup workflow may involve the following steps:
678678
* Controller then deletes the data mover pod and the snapshot.
679679
4. Application level action such as execute command to enable load balance after the backup data step has been done regardless of whether the backup succeeds or not. This step is only needed if step 2 is executed.
680680

681-
### Restore Application workflows
681+
## Application Restore workflows
682682

683683
Restore application in general does not require quiesce and unquiesce. One potential scenario in which quiesce might need to take place is for applications using ReadWriteMany PVCs(discussed below). The restore workflows are executed by the Data Protection Controller and it involves a component called a “data-mover” pod which can connect to backup devices (backup repositories) to retrieve metadata and data of the backup images. This data-mover pod will have to be created on-demand to be able to access PVCs to restore data.
684684

685-
#### Workflow for restore PVC-snapshot
685+
### Workflow for restore PVC-snapshot
686686

687-
##### Restore To New
687+
#### Restore To New
688688

689689
The general application restore to new process is done as following
690690

@@ -696,7 +696,7 @@ The general application restore to new process is done as following
696696
* Controller then terminates the data-mover pod thus unmounts all PVCs.
697697
* Then restore pods, statefulsets, services etc. which use those PVCs being restored in previous steps.
698698

699-
##### Restore to Production (partial restore)
699+
#### Restore to Production (partial restore)
700700

701701
However, when we restore to an existing/production environment where the application is running, if the application uses PVCs with access mode ReadWriteOne (most common), we cannot directly restore data to those PVCs. The step would be as followings:
702702

@@ -709,17 +709,17 @@ However, when we restore to an existing/production environment where the applica
709709
* Controller then terminate the data-move pod, effectively umount all PVCs
710710
* Finally the controller scales up the application deployment to the original replica. (Skip this step if PVCs are ReadWriteMany).
711711

712-
#### Workflow for restore logical-dump
712+
### Workflow for restore logical-dump
713713

714714
This workflow is much simpler and it can be done by the controller itself because the logical dump operations can be done via network connection to the application server pod without the need to access the application PVCs. Most applications that support logical-dump operation also support the same operation in reverse direction. The backup images are generally local files or file streams from remote backup devices.
715715

716-
##### Restore to New
716+
#### Restore to New
717717

718718
* First, controller restores the namespace, cluster resources etc. will be used by the application
719719
* Controller then restores application pods/deployments.
720720
* Then the controller runs a database/application client with these local files/file streams as input and executes the logical-dump operation in the reserve direction back to the database/application.
721721

722-
##### Restore to Production
722+
#### Restore to Production
723723

724724
* Controller simply runs the logical-dump operation in the reserve direction as mentioned above.
725725

0 commit comments

Comments
 (0)