You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-arc/data/storage-configuration.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,9 +19,9 @@ ms.topic: conceptual
19
19
20
20
Kubernetes provides an infrastructure abstraction layer over the underlying virtualization tech stack (optional) and hardware. The way that Kubernetes abstracts away storage is through **[Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/)**. When you provision a pod, you can specify a storage class for each volume. At the time the pod is provisioned, the storage class **[provisioner](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/)** is called to provision the storage, and then a **[persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)** is created on that provisioned storage and then the pod is mounted to the persistent volume by a **[persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims)**.
21
21
22
-
Kubernetes provides a way for storage infrastructure providers to plug in drivers (also called "Addons") that extend Kubernetes. Storage addons must comply with the **[Container Storage Interface standard](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/)**. There are dozens of addons that can be found in this non-definitive **[list of CSI drivers](https://kubernetes-csi.github.io/docs/drivers.html)**. Which CSI driver you use will depend on factors such as whether you are running in a cloud-hosted, managed Kubernetes service or which OEM provider you are using for your hardware.
22
+
Kubernetes provides a way for storage infrastructure providers to plug in drivers (also called "Addons") that extend Kubernetes. Storage addons must comply with the **[Container Storage Interface standard](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/)**. There are dozens of addons that can be found in this non-definitive **[list of CSI drivers](https://kubernetes-csi.github.io/docs/drivers.html)**. The specific CSI driver you use depends on factors such as whether you're running in a cloud-hosted, managed Kubernetes service or which OEM provider you use for your hardware.
23
23
24
-
You can view which storage classes are configured in your Kubernetes cluster by running this command:
24
+
To view the storage classes configured in your Kubernetes cluster, run this command:
25
25
26
26
```console
27
27
kubectl get storageclass
@@ -126,7 +126,7 @@ There are generally two types of storage:
126
126
127
127
Depending on the configuration of your NFS server and storage class provisioner, you may need to set the `supplementalGroups` in the pod configurations for database instances, and you may need to change the NFS server configuration to use the group IDs passed in by the client (as opposed to looking group IDs up on the server using the passed-in user ID). Consult your NFS administrator to determine if this is the case.
128
128
129
-
The `supplementalGroups` property takes an array of values and can be set as part of the Azure Arc data controller deployment and will be used by any database instances configured by the Azure Arc data controller.
129
+
The `supplementalGroups` property takes an array of values you can set at deployment. Azure Arc data controller applies these to any database instances it creates.
130
130
131
131
To set this property, run the following command:
132
132
@@ -145,11 +145,11 @@ Some services in Azure Arc for data services depend upon being configured to use
|**Controller API service**|`<namespace>/data-controller`|
147
147
148
-
At the time the data controller is provisioned, the storage class to be used for each of these persistent volumes is specified by either passing the --storage-class | -sc parameter to the `az arcdata dc create` command or by setting the storage classes in the control.json deployment template file that is used. If you are using the Azure portal to create the data controller in the directly connected mode, the deployment template that you choose will either have the storage class predefined in the template or if you select a template which does not have a predefined storage class then you will be prompted for one. If you use a custom deployment template, then you can specify the storage class.
148
+
At the time the data controller is provisioned, the storage class to be used for each of these persistent volumes is specified by either passing the --storage-class | -sc parameter to the `az arcdata dc create` command or by setting the storage classes in the control.json deployment template file that is used. If you're using the Azure portal to create the data controller in the directly connected mode, the deployment template that you choose either has the storage class predefined in the template or you can select a template that does not have a predefined storage class. If your template does not define a storage class, the portal prompts you for one. If you use a custom deployment template, then you can specify the storage class.
149
149
150
150
The deployment templates that are provided out of the box have a default storage class specified that is appropriate for the target environment, but it can be overridden during deployment. See the detailed steps to [create custom configuration templates](create-custom-configuration-template.md) to change the storage class configuration for the data controller pods at deployment time.
151
151
152
-
If you set the storage class using the --storage-class | -scparameter the storage class will be used for both log and data storage classes. If you set the storage classes in the deployment template file, you can specify different storage classes for logs and data.
152
+
If you set the storage class using the `--storage-class` or `-sc`parameter, that storage class is used for both log and data storage classes. If you set the storage classes in the deployment template file, you can specify different storage classes for logs and data.
153
153
154
154
Important factors to consider when choosing a storage class for the data controller pods:
155
155
@@ -159,11 +159,11 @@ Important factors to consider when choosing a storage class for the data control
159
159
- Changing the storage class post deployment is difficult, not documented, and not supported. Be sure to choose the storage class correctly at deployment time.
160
160
161
161
> [!NOTE]
162
-
> If no storage class is specified, the default storage class will be used. There can be only one default storage class per Kubernetes cluster. You can [change the default storage class](https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/).
162
+
> If no storage class is specified, the default storage class is used. There can be only one default storage class per Kubernetes cluster. You can [change the default storage class](https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/).
163
163
164
164
### Database instance storage configuration
165
165
166
-
Each database instance has data, logs, and backup persistent volumes. The storage classes for these persistent volumes can be specified at deployment time. If no storage class is specified the default storage class will be used.
166
+
Each database instance has data, logs, and backup persistent volumes. The storage classes for these persistent volumes can be specified at deployment time. If no storage class is specified the default storage class is used.
167
167
168
168
When creating an instance using either `az sql mi-arc create` or `az postgres server-arc create`, there are four parameters that can be used to set the storage classes:
169
169
@@ -191,9 +191,9 @@ The table below lists the paths inside the PostgreSQL instance container that is
191
191
|`--storage-class-data`, `-d`|/var/opt/postgresql|Contains data and log directories for the postgres installation|
192
192
|`--storage-class-logs`, `-g`|/var/log|Contains directories that store console output (stderr, stdout), other logging information of processes inside the container|
193
193
194
-
Each database instance will have a separate persistent volume for data files, logs, and backups. This means that there will be separation of the I/O for each of these types of files subject to how the volume provisioner will provision storage. Each database instance has its own persistent volume claims and persistent volumes.
194
+
Each database instance has a separate persistent volume for data files, logs, and backups. This means that there is separation of the I/O for each of these types of files subject to how the volume provisioner provisions storage. Each database instance has its own persistent volume claims and persistent volumes.
195
195
196
-
If there are multiple databases on a given database instance, all of the databases will use the same persistent volume claim, persistent volume, and storage class. All backups - both differential log backups and full backups will use the same persistent volume claim and persistent volume. The persistent volume claims for the database instance pods are shown below:
196
+
If there are multiple databases on a given database instance, all of the databases use the same persistent volume claim, persistent volume, and storage class. All backups - both differential log backups and full backups use the same persistent volume claim and persistent volume. The persistent volume claims for the database instance pods are shown below:
197
197
198
198
|**Instance**|**Persistent Volume Claims**|
199
199
|---|---|
@@ -206,8 +206,8 @@ Important factors to consider when choosing a storage class for the database ins
206
206
- Starting with the February, 2022 release of Azure Arc data services, you need to specify a **ReadWriteMany** (RWX) capable storage class for backups. Learn more about [access modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes). If no storage class is specified for backups, the default storage class in kubernetes is used and if this is not RWX capable, an Azure SQL managed instance deployment may not succeed.
207
207
- Database instances can be deployed in either a single pod pattern or a multiple pod pattern. An example of a single pod pattern is a General Purpose pricing tier Azure SQL managed instance. An example of a multiple pod pattern is a highly available Business Critical pricing tier Azure SQL managed instance. Database instances deployed with the single pod pattern **must** use a remote, shared storage class in order to ensure data durability and so that if a pod or node dies that when the pod is brought back up it can connect again to the persistent volume. In contrast, a highly available Azure SQL managed instance uses Always On Availability Groups to replicate the data from one instance to another either synchronously or asynchronously. Especially in the case where the data is replicated synchronously, there is always multiple copies of the data - typically three copies. Because of this, it is possible to use local storage or remote, shared storage classes for data and log files. If utilizing local storage, the data is still preserved even in the case of a failed pod, node, or storage hardware because there are multiple copies of the data. Given this flexibility, you might choose to use local storage for better performance.
208
208
- Database performance is largely a function of the I/O throughput of a given storage device. If your database is heavy on reads or heavy on writes, then you should choose a storage class with hardware designed for that type of workload. For example, if your database is mostly used for writes, you might choose local storage with RAID 0. If your database is mostly used for reads of a small amount of "hot data", but there is a large overall storage volume of cold data, then you might choose a SAN device capable of tiered storage. Choosing the right storage class is not any different than choosing the type of storage you would use for any database.
209
-
- If you are using a local storage volume provisioner, ensure that the local volumes that are provisioned for data, logs, and backups are each landing on different underlying storage devices to avoid contention on disk I/O. The OS should also be on a volume that is mounted to a separate disk(s). This is essentially the same guidance as would be followed for a database instance on physical hardware.
210
-
- Because all databases on a given instance share a persistent volume claim and persistent volume, be sure not to colocate busy database instances on the same database instance. If possible, separate busy databases on to their own database instances to avoid I/O contention. Further, use node label targeting to land database instances onto separate nodes so as to distribute overall I/O traffic across multiple nodes. If you are using virtualization, be sure to consider distributing I/O traffic not just at the node level but also the combined I/O activity happening by all the node VMs on a given physical host.
209
+
- If you're using a local storage volume provisioner, ensure that the local volumes that are provisioned for data, logs, and backups are each landing on different underlying storage devices to avoid contention on disk I/O. The OS should also be on a volume that is mounted to a separate disk(s). This is essentially the same guidance as would be followed for a database instance on physical hardware.
210
+
- Because all databases on a given instance share a persistent volume claim and persistent volume, be sure not to colocate busy database instances on the same database instance. If possible, separate busy databases on to their own database instances to avoid I/O contention. Further, use node label targeting to land database instances onto separate nodes so as to distribute overall I/O traffic across multiple nodes. If you're using virtualization, be sure to consider distributing I/O traffic not just at the node level but also the combined I/O activity happening by all the node VMs on a given physical host.
211
211
212
212
## Estimating storage requirements
213
213
Every pod that contains stateful data uses at least two persistent volumes - one persistent volume for data and another persistent volume for logs. The table below lists the number of persistent volumes required for a single Data Controller, Azure SQL Managed instance, Azure Database for PostgreSQL instance and Azure PostgreSQL HyperScale instance:
@@ -233,14 +233,14 @@ This calculation can be used to plan the storage for your Kubernetes cluster bas
233
233
234
234
### On-premises and edge sites
235
235
236
-
Microsoft and its OEM, OS, and Kubernetes partners have a validation program for Azure Arc data services. This program will provide customers comparable test results from a certification testing toolkit. The tests will evaluate feature compatibility, stress testing results, and performance and scalability. Each of these test results will indicate the OS used, Kubernetes distribution used, HW used, the CSI add-on used, and the storage classes used. This will help customers choose the best storage class, OS, Kubernetes distribution, and hardware for their requirements. More information on this program and test results can be found [here](validation-program.md).
236
+
Microsoft and its OEM, OS, and Kubernetes partners have a validation program for Azure Arc data services. This program provides comparable test results from a certification testing toolkit. The tests evaluate feature compatibility, stress testing results, and performance and scalability. Each of these test results indicate the OS used, Kubernetes distribution used, HW used, the CSI add-on used, and the storage classes used. This helps customers choose the best storage class, OS, Kubernetes distribution, and hardware for their requirements. More information on this program and test results can be found [here](validation-program.md).
237
237
238
238
#### Public cloud, managed Kubernetes services
239
239
240
240
For public cloud-based, managed Kubernetes services we can make the following recommendations:
241
241
242
242
|Public cloud service|Recommendation|
243
243
|---|---|
244
-
|**Azure Kubernetes Service (AKS)**|Azure Kubernetes Service (AKS) has two types of storage - Azure Files and Azure Managed Disks. Each type of storage has two pricing/performance tiers - standard (HDD) and premium (SSD). Thus, the four storage classes provided in AKS are `azurefile` (Azure Files standard tier), `azurefile-premium` (Azure Files premium tier), `default` (Azure Disks standard tier), and `managed-premium` (Azure Disks premium tier). The default storage class is `default` (Azure Disks standard tier). There are substantial **[pricing differences](https://azure.microsoft.com/pricing/details/storage/)** between the types and tiers which should be factored into your decision. For production workloads with high-performance requirements, we recommend using `managed-premium` for all storage classes. For dev/test workloads, proofs of concept, etc. where cost is a consideration, then `azurefile` is the least expensive option. All four of the options can be used for situations requiring remote, shared storage as they are all network-attached storage devices in Azure. Read more about [AKS Storage](../../aks/concepts-storage.md).|
244
+
|**Azure Kubernetes Service (AKS)**|Azure Kubernetes Service (AKS) has two types of storage - Azure Files and Azure Managed Disks. Each type of storage has two pricing/performance tiers - standard (HDD) and premium (SSD). Thus, the four storage classes provided in AKS are `azurefile` (Azure Files standard tier), `azurefile-premium` (Azure Files premium tier), `default` (Azure Disks standard tier), and `managed-premium` (Azure Disks premium tier). The default storage class is `default` (Azure Disks standard tier). There are substantial **[pricing differences](https://azure.microsoft.com/pricing/details/storage/)** between the types and tiers that you should consider. For production workloads with high-performance requirements, we recommend using `managed-premium` for all storage classes. For dev/test workloads, proofs of concept, etc. where cost is a consideration, then `azurefile` is the least expensive option. All four of the options can be used for situations requiring remote, shared storage as they are all network-attached storage devices in Azure. Read more about [AKS Storage](../../aks/concepts-storage.md).|
245
245
|**AWS Elastic Kubernetes Service (EKS)**| Amazon's Elastic Kubernetes Service has one primary storage class - based on the [EBS CSI storage driver](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html). This is recommended for production workloads. There is a new storage driver - [EFS CSI storage driver](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) - that can be added to an EKS cluster, but it is currently in a beta stage and subject to change. Although AWS says that this storage driver is supported for production, we don't recommend using it because it is still in beta and subject to change. The EBS storage class is the default and is called `gp2`. Read more about [EKS Storage](https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html).|
246
-
|**Google Kubernetes Engine (GKE)**|Google Kubernetes Engine (GKE) has just one storage class called `standard`, which is used for [GCE persistent disks](https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk). Being the only one, it is also the default. Although there is a [local, static volume provisioner](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd#run-local-volume-static-provisioner) for GKE that you can use with direct-attached SSDs, we don't recommend using it as it is not maintained or supported by Google. Read more about [GKE storage](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes).
246
+
|**Google Kubernetes Engine (GKE)**|Google Kubernetes Engine (GKE) has just one storage class called `standard`. This class is used for [GCE persistent disks](https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk). Being the only one, it is also the default. Although there is a [local, static volume provisioner](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd#run-local-volume-static-provisioner) for GKE that you can use with direct-attached SSDs, we don't recommend using it as it is not maintained or supported by Google. Read more about [GKE storage](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes).
0 commit comments