adding support for workload identity in neo4j admin backup (#1131) (#1154)

renetapopova · web-flow · commit c393a8c11927 · 2023-10-24T11:34:29.000+01:00
Cherry-picks #1131
diff --git a/modules/ROOT/pages/kubernetes/operations/backup-restore.adoc b/modules/ROOT/pages/kubernetes/operations/backup-restore.adoc
@@ -13,29 +13,28 @@ For more information, see xref:kubernetes/accessing-neo4j.adoc[Accessing Neo4j].
 
 You can perform a backup of a Neo4j database(s) to any cloud provider (AWS, GCP, and Azure) bucket using the _neo4j/neo4j-admin_ Helm chart.
 From Neo4j 5.10.0, the _neo4j/neo4j-admin_ Helm chart also supports performing a backup of multiple databases.
+And from 5.13.0, the _neo4j/neo4j-admin_ Helm chart also supports workload identity integration for GCP, AWS, and Azure.
 
 === Prerequisites
 
 Before you can back up a database and upload it to your bucket, verify that you have the following:
 
 * A cloud provider bucket (AWS, GCP, or Azure) with read and write access to be able to upload the backup.
 * Credentials to access the cloud provider bucket, such as a service account JSON key file for GCP, a credentials file for AWS, or storage account credentials for Azure.
+* A service account with workload identity if you want to use workload identity integration to access the cloud provider bucket.
+** For more information on setting up a service account with workload identity on GCP and AWS, see:
+*** link:https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity[Google Kubernetes Engine (GKE) -> Use Workload Identity]
+*** link:https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html[Amazon EKS -> Configuring a Kubernetes service account to assume an IAM role]
+** For more information on setting up an Azure storage account with workload identity, link:https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go[Microsoft Azure -> Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS)]
 * A Kubernetes cluster running on one of the cloud providers with the Neo4j Helm chart installed.
 For more information, see xref:kubernetes/quickstart-standalone/index.adoc[Quickstart: Deploy a standalone instance] or xref:kubernetes/quickstart-cluster/index.adoc[Quickstart: Deploy a cluster].
+* The latest Neo4j Helm charts.
+You can update the repository to get the latest charts using `helm repo update`.
 
-=== Steps
+=== Create a Kubernetes secret
 
-To perform a backup of a Neo4j database to any cloud provider (AWS, GCP, and Azure) bucket, follow these steps:
+You can create a Kubernetes secret with the credentials that can access the cloud provider bucket using one of the following options:
 
-. Update the repository to get the latest charts:
-+
-[source, shell, role='noheader']
-----
-helm repo update
-----
-
-. Create a Kubernetes secret with the credentials to access the cloud provider bucket using one of the following options:
-+
 [.tabbed-example]
 =====
 [.include-with-gke]
@@ -86,14 +85,19 @@ kubectl create secret generic azurecred --from-file=credentials=/path/to/your/cr
 ======
 =====
 
-. Configure the backup parameters in the _backup-values.yaml_ file using one of the following options:
-+
+=== Configure the backup parameters
+
+You can configure the backup parameters in the _backup-values.yaml_ file either by using the `secretName` and `secretKeyName` parameters or by mapping the Kubernetes service account
+to the workload identity integration.
+
 [NOTE]
 ====
 The following examples show the minimum configuration required to perform a backup to a cloud provider bucket.
 For more information about the available backup parameters, see <<kubernetes-neo4j-backup-parameters, Backup parameters>>.
 ====
-+
+
+==== Configure the _backup-values.yaml_ file using the `secretName` and `secretKeyName` parameters
+
 [.tabbed-example]
 =====
 [.include-with-gke]
@@ -171,36 +175,117 @@ consistencyCheck:
 ----
 ======
 =====
-+
+
+==== Configure the _backup-values.yaml_ file using service account workload identity integration
+
+In certain situations, it may be useful to assign a Kubernetes Service Account with workload identity integration to the Neo4j backup pod.
+This is particularly relevant when you want to improve security and have more precise access control for the pod.
+Doing so ensures that secure access to resources is granted based on the pod's identity within the cloud ecosystem.
+For more information on setting up a service account with workload identity, see https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity[Google Kubernetes Engine (GKE) -> Use Workload Identity], https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html[Amazon EKS -> Configuring a Kubernetes service account to assume an IAM role], and https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go[Microsoft Azure -> Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS)].
+
+To configure the Neo4j backup pod to use a Kubernetes service account with workload identity, set `serviceAccountName` to the name of the service account to use.
+For Azure deployments, you also need to set the `azureStorageAccountName` parameter to the name of the Azure storage account, where the backup files will be uploaded.
+For example:
+
+[.tabbed-example]
+=====
+[.include-with-gke]
+======
+[source, yaml, role='noheader']
+----
+neo4j:
+  image: "neo4j/helm-charts-backup"
+  imageTag: "5.13.0"
+  jobSchedule: "* * * * *"
+  successfulJobsHistoryLimit: 3
+  failedJobsHistoryLimit: 1
+  backoffLimit: 3
+
+backup:
+  bucketName: "my-bucket"
+  databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
+  database: "neo4j,system"
+  cloudProvider: "gcp"
+  secretName: ""
+  secretKeyName: ""
+
+consistencyCheck:
+  enabled: true
+
+serviceAccountName: "demo-service-account"
+----
+======
+
+[.include-with-aws]
+======
+[source, yaml, role='noheader']
+----
+neo4j:
+  image: "neo4j/helm-charts-backup"
+  imageTag: "5.13.0"
+  jobSchedule: "* * * * *"
+  successfulJobsHistoryLimit: 3
+  failedJobsHistoryLimit: 1
+  backoffLimit: 3
+
+backup:
+  bucketName: "my-bucket"
+  databaseAdminServiceName:  "standalone-admin"
+  database: "neo4j,system"
+  cloudProvider: "aws"
+  secretName: ""
+  secretKeyName: ""
+
+consistencyCheck:
+  enabled: true
+
+serviceAccountName: "demo-service-account"
+----
+======
+
+[.include-with-azure]
+======
+[source, yaml, role='noheader']
+----
+neo4j:
+  image: "neo4j/helm-charts-backup"
+  imageTag: "5.13.0"
+  jobSchedule: "* * * * *"
+  successfulJobsHistoryLimit: 3
+  failedJobsHistoryLimit: 1
+  backoffLimit: 3
+
+backup:
+  bucketName: "my-bucket"
+  databaseAdminServiceName:  "standalone-admin"
+  database: "neo4j,system"
+  cloudProvider: "azure"
+  azureStorageAccountName: "storageAccountName"
+
+consistencyCheck:
+  enabled: true
+
+serviceAccountName: "demo-service-account"
+----
+======
+=====
 The _/backups_ mount created by default is an _emptyDir_ type volume.
 This means that the data stored in this volume is not persistent and will be lost when the pod is deleted.
 To use a persistent volume for backups add the following section to the _backup-values.yaml_ file:
-+
+
 [source, yaml, role='noheader']
 ----
 tempVolume:
   persistentVolumeClaim:
     claimName: backup-pvc
 ----
-+
+
 [NOTE]
 ====
 You need to create the persistent volume and persistent volume claim before installing the _neo4j-admin_ Helm chart.
 For more information, see xref:kubernetes/persistent-volumes.adoc[Volume mounts and persistent volumes].
 ====
 
-. Install _neo4j-admin_ Helm chart using the _backup-values.yaml_ file:
-+
-[source, shell, role='noheader']
-----
-helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
-----
-+
-The _neo4j/neo4j-admin_ Helm chart installs a cronjob that launches a pod based on the job schedule. This pod performs a backup of one or multiple databases, a consistency check of the backup file(s),  and uploads them to the cloud provider bucket.
-
-. Monitor the backup pod logs using `kubectl logs pod/<neo4j-backup-pod-name>` to check the progress of the backup.
-. Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket.
-
 [[kubernetes-neo4j-backup-parameters]]
 === Backup parameters
 
@@ -228,7 +313,7 @@ disableLookups: false
 
 neo4j:
   image: "neo4j/helm-charts-backup"
-  imageTag: "5.11.0"
+  imageTag: "5.13.0"
   podLabels: {}
 #    app: "demo"
 #    acac: "dcdddc"
@@ -303,7 +388,9 @@ backup:
   secretName: ""
   # provide the keyname used in the above secret
   secretKeyName: ""
-
+  # provide the azure storage account name
+  # this to be provided when you are using workload identity integration for azure
+  azureStorageAccountName: ""
   #setting this to true will not delete the backup files generated at the /backup mount
   keepBackupFiles: true
 
@@ -334,6 +421,10 @@ consistencyCheck:
   verbose: true
 
 # Set to name of an existing Service Account to use if desired
+# Follow the following links for setting up a service account with workload identity
+# Azure - https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go
+# GCP - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
+# AWS - https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html
 serviceAccountName: ""
 
 # Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
@@ -399,6 +490,21 @@ tolerations: []
 #    effect: "NoSchedule"
 ----
 
+=== Install the _neo4j-admin_ Helm chart
+
+. Install _neo4j-admin_ Helm chart using the _backup-values.yaml_ file:
++
+[source, shell, role='noheader']
+----
+helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
+----
++
+The _neo4j/neo4j-admin_ Helm chart installs a cronjob that launches a pod based on the job schedule.
+This pod performs a backup of one or multiple databases, a consistency check of the backup file(s),  and uploads them to the cloud provider bucket.
+
+. Monitor the backup pod logs using `kubectl logs pod/<neo4j-backup-pod-name>` to check the progress of the backup.
+. Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket.
+
 [[kubernetes-neo4j-restore]]
 == Restore a single database