Skip to content

Commit fe456fd

Browse files
author
Mike Ray (Microsoft)
committed
Some grammar gymnastics for the machine
1 parent 1ed1e16 commit fe456fd

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

articles/azure-arc/data/storage-configuration.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ ms.topic: conceptual
1919

2020
Kubernetes provides an infrastructure abstraction layer over the underlying virtualization tech stack (optional) and hardware. The way that Kubernetes abstracts away storage is through **[Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/)**. When you provision a pod, you can specify a storage class for each volume. At the time the pod is provisioned, the storage class **[provisioner](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/)** is called to provision the storage, and then a **[persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)** is created on that provisioned storage and then the pod is mounted to the persistent volume by a **[persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims)**.
2121

22-
Kubernetes provides a way for storage infrastructure providers to plug in drivers (also called "Addons") that extend Kubernetes. Storage addons must comply with the **[Container Storage Interface standard](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/)**. There are dozens of addons that can be found in this non-definitive **[list of CSI drivers](https://kubernetes-csi.github.io/docs/drivers.html)**. Which CSI driver you use will depend on factors such as whether you are running in a cloud-hosted, managed Kubernetes service or which OEM provider you are using for your hardware.
22+
Kubernetes provides a way for storage infrastructure providers to plug in drivers (also called "Addons") that extend Kubernetes. Storage addons must comply with the **[Container Storage Interface standard](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/)**. There are dozens of addons that can be found in this non-definitive **[list of CSI drivers](https://kubernetes-csi.github.io/docs/drivers.html)**. The specific CSI driver you use depends on factors such as whether you're running in a cloud-hosted, managed Kubernetes service or which OEM provider you use for your hardware.
2323

24-
You can view which storage classes are configured in your Kubernetes cluster by running this command:
24+
To view the storage classes configured in your Kubernetes cluster, run this command:
2525

2626
```console
2727
kubectl get storageclass
@@ -126,7 +126,7 @@ There are generally two types of storage:
126126

127127
Depending on the configuration of your NFS server and storage class provisioner, you may need to set the `supplementalGroups` in the pod configurations for database instances, and you may need to change the NFS server configuration to use the group IDs passed in by the client (as opposed to looking group IDs up on the server using the passed-in user ID). Consult your NFS administrator to determine if this is the case.
128128

129-
The `supplementalGroups` property takes an array of values and can be set as part of the Azure Arc data controller deployment and will be used by any database instances configured by the Azure Arc data controller.
129+
The `supplementalGroups` property takes an array of values you can set at deployment. Azure Arc data controller applies these to any database instances it creates.
130130

131131
To set this property, run the following command:
132132

@@ -145,11 +145,11 @@ Some services in Azure Arc for data services depend upon being configured to use
145145
|**Controller SQL instance**|`<namespace>/logs-controldb`, `<namespace>/data-controldb`|
146146
|**Controller API service**|`<namespace>/data-controller`|
147147

148-
At the time the data controller is provisioned, the storage class to be used for each of these persistent volumes is specified by either passing the --storage-class | -sc parameter to the `az arcdata dc create` command or by setting the storage classes in the control.json deployment template file that is used. If you are using the Azure portal to create the data controller in the directly connected mode, the deployment template that you choose will either have the storage class predefined in the template or if you select a template which does not have a predefined storage class then you will be prompted for one. If you use a custom deployment template, then you can specify the storage class.
148+
At the time the data controller is provisioned, the storage class to be used for each of these persistent volumes is specified by either passing the --storage-class | -sc parameter to the `az arcdata dc create` command or by setting the storage classes in the control.json deployment template file that is used. If you're using the Azure portal to create the data controller in the directly connected mode, the deployment template that you choose either has the storage class predefined in the template or you can select a template that does not have a predefined storage class. If your template does not define a storage class, the portal prompts you for one. If you use a custom deployment template, then you can specify the storage class.
149149

150150
The deployment templates that are provided out of the box have a default storage class specified that is appropriate for the target environment, but it can be overridden during deployment. See the detailed steps to [create custom configuration templates](create-custom-configuration-template.md) to change the storage class configuration for the data controller pods at deployment time.
151151

152-
If you set the storage class using the --storage-class | -sc parameter the storage class will be used for both log and data storage classes. If you set the storage classes in the deployment template file, you can specify different storage classes for logs and data.
152+
If you set the storage class using the `--storage-class` or `-sc`parameter, that storage class is used for both log and data storage classes. If you set the storage classes in the deployment template file, you can specify different storage classes for logs and data.
153153

154154
Important factors to consider when choosing a storage class for the data controller pods:
155155

@@ -159,11 +159,11 @@ Important factors to consider when choosing a storage class for the data control
159159
- Changing the storage class post deployment is difficult, not documented, and not supported. Be sure to choose the storage class correctly at deployment time.
160160

161161
> [!NOTE]
162-
> If no storage class is specified, the default storage class will be used. There can be only one default storage class per Kubernetes cluster. You can [change the default storage class](https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/).
162+
> If no storage class is specified, the default storage class is used. There can be only one default storage class per Kubernetes cluster. You can [change the default storage class](https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/).
163163
164164
### Database instance storage configuration
165165

166-
Each database instance has data, logs, and backup persistent volumes. The storage classes for these persistent volumes can be specified at deployment time. If no storage class is specified the default storage class will be used.
166+
Each database instance has data, logs, and backup persistent volumes. The storage classes for these persistent volumes can be specified at deployment time. If no storage class is specified the default storage class is used.
167167

168168
When creating an instance using either `az sql mi-arc create` or `az postgres server-arc create`, there are four parameters that can be used to set the storage classes:
169169

@@ -191,9 +191,9 @@ The table below lists the paths inside the PostgreSQL instance container that is
191191
|`--storage-class-data`, `-d`|/var/opt/postgresql|Contains data and log directories for the postgres installation|
192192
|`--storage-class-logs`, `-g`|/var/log|Contains directories that store console output (stderr, stdout), other logging information of processes inside the container|
193193

194-
Each database instance will have a separate persistent volume for data files, logs, and backups. This means that there will be separation of the I/O for each of these types of files subject to how the volume provisioner will provision storage. Each database instance has its own persistent volume claims and persistent volumes.
194+
Each database instance has a separate persistent volume for data files, logs, and backups. This means that there is separation of the I/O for each of these types of files subject to how the volume provisioner provisions storage. Each database instance has its own persistent volume claims and persistent volumes.
195195

196-
If there are multiple databases on a given database instance, all of the databases will use the same persistent volume claim, persistent volume, and storage class. All backups - both differential log backups and full backups will use the same persistent volume claim and persistent volume. The persistent volume claims for the database instance pods are shown below:
196+
If there are multiple databases on a given database instance, all of the databases use the same persistent volume claim, persistent volume, and storage class. All backups - both differential log backups and full backups use the same persistent volume claim and persistent volume. The persistent volume claims for the database instance pods are shown below:
197197

198198
|**Instance**|**Persistent Volume Claims**|
199199
|---|---|
@@ -206,8 +206,8 @@ Important factors to consider when choosing a storage class for the database ins
206206
- Starting with the February, 2022 release of Azure Arc data services, you need to specify a **ReadWriteMany** (RWX) capable storage class for backups. Learn more about [access modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes). If no storage class is specified for backups, the default storage class in kubernetes is used and if this is not RWX capable, an Azure SQL managed instance deployment may not succeed.
207207
- Database instances can be deployed in either a single pod pattern or a multiple pod pattern. An example of a single pod pattern is a General Purpose pricing tier Azure SQL managed instance. An example of a multiple pod pattern is a highly available Business Critical pricing tier Azure SQL managed instance. Database instances deployed with the single pod pattern **must** use a remote, shared storage class in order to ensure data durability and so that if a pod or node dies that when the pod is brought back up it can connect again to the persistent volume. In contrast, a highly available Azure SQL managed instance uses Always On Availability Groups to replicate the data from one instance to another either synchronously or asynchronously. Especially in the case where the data is replicated synchronously, there is always multiple copies of the data - typically three copies. Because of this, it is possible to use local storage or remote, shared storage classes for data and log files. If utilizing local storage, the data is still preserved even in the case of a failed pod, node, or storage hardware because there are multiple copies of the data. Given this flexibility, you might choose to use local storage for better performance.
208208
- Database performance is largely a function of the I/O throughput of a given storage device. If your database is heavy on reads or heavy on writes, then you should choose a storage class with hardware designed for that type of workload. For example, if your database is mostly used for writes, you might choose local storage with RAID 0. If your database is mostly used for reads of a small amount of "hot data", but there is a large overall storage volume of cold data, then you might choose a SAN device capable of tiered storage. Choosing the right storage class is not any different than choosing the type of storage you would use for any database.
209-
- If you are using a local storage volume provisioner, ensure that the local volumes that are provisioned for data, logs, and backups are each landing on different underlying storage devices to avoid contention on disk I/O. The OS should also be on a volume that is mounted to a separate disk(s). This is essentially the same guidance as would be followed for a database instance on physical hardware.
210-
- Because all databases on a given instance share a persistent volume claim and persistent volume, be sure not to colocate busy database instances on the same database instance. If possible, separate busy databases on to their own database instances to avoid I/O contention. Further, use node label targeting to land database instances onto separate nodes so as to distribute overall I/O traffic across multiple nodes. If you are using virtualization, be sure to consider distributing I/O traffic not just at the node level but also the combined I/O activity happening by all the node VMs on a given physical host.
209+
- If you're using a local storage volume provisioner, ensure that the local volumes that are provisioned for data, logs, and backups are each landing on different underlying storage devices to avoid contention on disk I/O. The OS should also be on a volume that is mounted to a separate disk(s). This is essentially the same guidance as would be followed for a database instance on physical hardware.
210+
- Because all databases on a given instance share a persistent volume claim and persistent volume, be sure not to colocate busy database instances on the same database instance. If possible, separate busy databases on to their own database instances to avoid I/O contention. Further, use node label targeting to land database instances onto separate nodes so as to distribute overall I/O traffic across multiple nodes. If you're using virtualization, be sure to consider distributing I/O traffic not just at the node level but also the combined I/O activity happening by all the node VMs on a given physical host.
211211

212212
## Estimating storage requirements
213213
Every pod that contains stateful data uses at least two persistent volumes - one persistent volume for data and another persistent volume for logs. The table below lists the number of persistent volumes required for a single Data Controller, Azure SQL Managed instance, Azure Database for PostgreSQL instance and Azure PostgreSQL HyperScale instance:
@@ -233,14 +233,14 @@ This calculation can be used to plan the storage for your Kubernetes cluster bas
233233

234234
### On-premises and edge sites
235235

236-
Microsoft and its OEM, OS, and Kubernetes partners have a validation program for Azure Arc data services. This program will provide customers comparable test results from a certification testing toolkit. The tests will evaluate feature compatibility, stress testing results, and performance and scalability. Each of these test results will indicate the OS used, Kubernetes distribution used, HW used, the CSI add-on used, and the storage classes used. This will help customers choose the best storage class, OS, Kubernetes distribution, and hardware for their requirements. More information on this program and test results can be found [here](validation-program.md).
236+
Microsoft and its OEM, OS, and Kubernetes partners have a validation program for Azure Arc data services. This program provides comparable test results from a certification testing toolkit. The tests evaluate feature compatibility, stress testing results, and performance and scalability. Each of these test results indicate the OS used, Kubernetes distribution used, HW used, the CSI add-on used, and the storage classes used. This helps customers choose the best storage class, OS, Kubernetes distribution, and hardware for their requirements. More information on this program and test results can be found [here](validation-program.md).
237237

238238
#### Public cloud, managed Kubernetes services
239239

240240
For public cloud-based, managed Kubernetes services we can make the following recommendations:
241241

242242
|Public cloud service|Recommendation|
243243
|---|---|
244-
|**Azure Kubernetes Service (AKS)**|Azure Kubernetes Service (AKS) has two types of storage - Azure Files and Azure Managed Disks. Each type of storage has two pricing/performance tiers - standard (HDD) and premium (SSD). Thus, the four storage classes provided in AKS are `azurefile` (Azure Files standard tier), `azurefile-premium` (Azure Files premium tier), `default` (Azure Disks standard tier), and `managed-premium` (Azure Disks premium tier). The default storage class is `default` (Azure Disks standard tier). There are substantial **[pricing differences](https://azure.microsoft.com/pricing/details/storage/)** between the types and tiers which should be factored into your decision. For production workloads with high-performance requirements, we recommend using `managed-premium` for all storage classes. For dev/test workloads, proofs of concept, etc. where cost is a consideration, then `azurefile` is the least expensive option. All four of the options can be used for situations requiring remote, shared storage as they are all network-attached storage devices in Azure. Read more about [AKS Storage](../../aks/concepts-storage.md).|
244+
|**Azure Kubernetes Service (AKS)**|Azure Kubernetes Service (AKS) has two types of storage - Azure Files and Azure Managed Disks. Each type of storage has two pricing/performance tiers - standard (HDD) and premium (SSD). Thus, the four storage classes provided in AKS are `azurefile` (Azure Files standard tier), `azurefile-premium` (Azure Files premium tier), `default` (Azure Disks standard tier), and `managed-premium` (Azure Disks premium tier). The default storage class is `default` (Azure Disks standard tier). There are substantial **[pricing differences](https://azure.microsoft.com/pricing/details/storage/)** between the types and tiers that you should consider. For production workloads with high-performance requirements, we recommend using `managed-premium` for all storage classes. For dev/test workloads, proofs of concept, etc. where cost is a consideration, then `azurefile` is the least expensive option. All four of the options can be used for situations requiring remote, shared storage as they are all network-attached storage devices in Azure. Read more about [AKS Storage](../../aks/concepts-storage.md).|
245245
|**AWS Elastic Kubernetes Service (EKS)**| Amazon's Elastic Kubernetes Service has one primary storage class - based on the [EBS CSI storage driver](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html). This is recommended for production workloads. There is a new storage driver - [EFS CSI storage driver](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) - that can be added to an EKS cluster, but it is currently in a beta stage and subject to change. Although AWS says that this storage driver is supported for production, we don't recommend using it because it is still in beta and subject to change. The EBS storage class is the default and is called `gp2`. Read more about [EKS Storage](https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html).|
246-
|**Google Kubernetes Engine (GKE)**|Google Kubernetes Engine (GKE) has just one storage class called `standard`, which is used for [GCE persistent disks](https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk). Being the only one, it is also the default. Although there is a [local, static volume provisioner](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd#run-local-volume-static-provisioner) for GKE that you can use with direct-attached SSDs, we don't recommend using it as it is not maintained or supported by Google. Read more about [GKE storage](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes).
246+
|**Google Kubernetes Engine (GKE)**|Google Kubernetes Engine (GKE) has just one storage class called `standard`. This class is used for [GCE persistent disks](https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk). Being the only one, it is also the default. Although there is a [local, static volume provisioner](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd#run-local-volume-static-provisioner) for GKE that you can use with direct-attached SSDs, we don't recommend using it as it is not maintained or supported by Google. Read more about [GKE storage](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes).

0 commit comments

Comments
 (0)