Skip to content

Commit 74d747e

Browse files
Merge pull request #112538 from ekpgh/hpc-feedback-0423
changes (human + acrolinx)
2 parents 3619631 + 34101da commit 74d747e

8 files changed

+45
-36
lines changed

articles/hpc-cache/configuration.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ description: Explains how to configure additional settings for the cache like MT
44
author: ekpgh
55
ms.service: hpc-cache
66
ms.topic: conceptual
7-
ms.date: 04/15/2020
7+
ms.date: 04/27/2020
88
ms.author: v-erkel
99
---
1010

1111
# Configure additional Azure HPC Cache settings
1212

13-
The **Configuration** page in the Azure portal has options for customizing several settings. Most users do not need to change these from the default values.
13+
The **Configuration** page in the Azure portal has options for customizing several settings. Most users don't need to change these settings from their default values.
1414

1515
This article also describes how to use the snapshot feature for Azure Blob storage targets. The snapshot feature has no configurable settings.
1616

@@ -28,7 +28,7 @@ The default value is 1500 bytes, but you can change it to 1400.
2828
> [!NOTE]
2929
> If you lower the cache's MTU size, make sure that the clients and storage systems that communicate with the cache have the same MTU setting or a lower value.
3030
31-
Lowering the cache MTU value can help you work around packet size restrictions in the rest of the cache's network. For example, some VPNs can't transmit full-size 1500 byte packets successfully. Reducing the size of packets sent over the VPN might eliminate that issue. However, note that a lower cache MTU setting means that any other component that communicates with the cache - including clients and storage systems - must also have a lower setting to avoid communication problems with the cache.
31+
Lowering the cache MTU value can help you work around packet size restrictions in the rest of the cache's network. For example, some VPNs can't transmit full-size 1500-byte packets successfully. Reducing the size of packets sent over the VPN might eliminate that issue. However, note that a lower cache MTU setting means that any other component that communicates with the cache - including clients and storage systems - must also have a lower MTU setting to avoid communication problems.
3232

3333
If you don't want to change the MTU settings on other system components, you should not lower the cache's MTU setting. There are other solutions to work around VPN packet size restrictions. Read [Adjust VPN packet size restrictions](troubleshoot-nas.md#adjust-vpn-packet-size-restrictions) in the NAS troubleshooting article to learn more about diagnosing and addressing this problem.
3434

@@ -41,16 +41,20 @@ The **Enable root squash** setting controls how the Azure HPC Cache allows root
4141

4242
This setting lets users control root access at the cache level, which can help compensate for the required ``no_root_squash`` setting for NAS systems used as storage targets. (Read more about [NFS storage target prerequisites](hpc-cache-prereqs.md#nfs-storage-requirements).) It also can improve security when used with Azure Blob storage targets.
4343

44-
The default setting is **Yes**. (Caches created before April 2020 might have the default setting **No**.) When enabled, this feature also prevents use of set-UID permission bits in client requests to the cache.
44+
The default setting is **Yes**. (Caches created before April 2020 might have the default setting **No**.)
45+
46+
When enabled, this feature also prevents use of set-UID permission bits in client requests to the cache.
4547

4648
## View snapshots for blob storage targets
4749

48-
Azure HPC Cache automatically saves storage snapshots for Azure Blob storage targets. Snapshots provide a quick reference point for the contents of the back-end storage container. Snapshots are not a replacement for data backups, and they don't include any information about the state of cached data.
50+
Azure HPC Cache automatically saves storage snapshots for Azure Blob storage targets. Snapshots provide a quick reference point for the contents of the back-end storage container.
51+
52+
Snapshots are not a replacement for data backups, and they don't include any information about the state of cached data.
4953

5054
> [!NOTE]
51-
> This snapshot feature is different from the snapshot feature included in NetApp, Isilon, or ZFS storage software. Those snapshot implementations flush changes from the cache to the back-end storage system before taking the snapshot.
55+
> This snapshot feature is different from the snapshot feature included in NetApp or Isilon storage software. Those snapshot implementations flush changes from the cache to the back-end storage system before taking the snapshot.
5256
>
53-
> For efficiency, the Azure HPC Cache snapshot does not flush changes first, and only records data that has been written to the Blob container. This snapshot does not represent the state of cached data, so recent changes might be excluded.
57+
> For efficiency, the Azure HPC Cache snapshot does not flush changes first, and only records data that has been written to the Blob container. This snapshot does not represent the state of cached data, so it might not include recent changes.
5458
5559
This feature is available for Azure Blob storage targets only, and its configuration can't be changed.
5660

articles/hpc-cache/customer-keys.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: How to use Azure Key Vault with Azure HPC Cache to control encrypti
44
author: ekpgh
55
ms.service: hpc-cache
66
ms.topic: conceptual
7-
ms.date: 04/15/2020
7+
ms.date: 04/23/2020
88
ms.author: v-erkel
99
---
1010

@@ -29,7 +29,7 @@ There are three steps to enable customer-managed key encryption for Azure HPC Ca
2929

3030
Encryption is not completely set up until after you authorize it from the newly created cache (step 3). This is because you must pass the cache's identity to the key vault to make it an authorized user. You can't do this before creating the cache, because the identity does not exist until the cache is created.
3131

32-
After you create the cache, you cannot change between customer-managed keys and Microsoft-managed keys. However, if your cache uses customer-managed keys you can [change](#update-key-settings) the encryption key, the key version, and the key vault as needed.
32+
After you create the cache, you can't change between customer-managed keys and Microsoft-managed keys. However, if your cache uses customer-managed keys you can [change](#update-key-settings) the encryption key, the key version, and the key vault as needed.
3333

3434
## Understand key vault and key requirements
3535

@@ -43,7 +43,7 @@ Key vault properties:
4343
* **Soft delete** - Azure HPC Cache will enable soft delete if it is not already configured on the key vault.
4444
* **Purge protection** - Purge protection must be enabled.
4545
* **Access policy** - Default settings are sufficient.
46-
* **Network connectivity** - Azure HPC Cache must be able to access the key vault regardless of the endpoint settings you choose.
46+
* **Network connectivity** - Azure HPC Cache must be able to access the key vault, regardless of the endpoint settings you choose.
4747

4848
Key properties:
4949

@@ -96,7 +96,10 @@ Continue with the rest of the specifications and create the cache as described i
9696
## 3. Authorize Azure Key Vault encryption from the cache
9797
<!-- header is linked from create article, update if changed -->
9898

99-
After a few minutes, the new Azure HPC Cache appears in your Azure portal. Go to the **Overview** page to authorize it to access your Azure Key Vault and enable customer-managed key encryption. (The cache might appear in the resources list before the "deployment underway" messages clear.)
99+
After a few minutes, the new Azure HPC Cache appears in your Azure portal. Go to the **Overview** page to authorize it to access your Azure Key Vault and enable customer-managed key encryption.
100+
101+
> [!TIP]
102+
> The cache might appear in the resources list before the "deployment underway" messages clear. Check your resources list after a minute or two instead of waiting for a success notification.
100103
101104
This two-step process is necessary because the Azure HPC Cache instance needs an identity to pass to the Azure Key Vault for authorization. The cache identity doesn't exist until after its initial creation steps are complete.
102105

@@ -117,7 +120,9 @@ After you authorize encryption, Azure HPC Cache goes through several more minute
117120

118121
## Update key settings
119122

120-
You can change the key vault, key, or key version for your cache from the Azure portal. Click the cache's **Encryption** settings link to open the **Customer key settings** page. (You cannot change a cache between customer-managed keys and system-managed keys.)
123+
You can change the key vault, key, or key version for your cache from the Azure portal. Click the cache's **Encryption** settings link to open the **Customer key settings** page.
124+
125+
You cannot change a cache between customer-managed keys and system-managed keys.
121126

122127
![screenshot of "Customer keys settings" page, reached by clicking Settings > Encryption from the cache page in the Azure portal](media/change-key-click.png)
123128

@@ -136,7 +141,7 @@ After you choose the new encryption key values, click **Select**. A confirmation
136141
These articles explain more about using Azure Key Vault and customer-managed keys to encrypt data in Azure:
137142

138143
* [Azure storage encryption overview](../storage/common/storage-service-encryption.md)
139-
* [Disk encryption with customer-managed keys](../virtual-machines/linux/disk-encryption.md#customer-managed-keys) - Documentation for using Azure Key Vault and managed disks, which is similar to the process used with Azure HPC Cache
144+
* [Disk encryption with customer-managed keys](../virtual-machines/linux/disk-encryption.md#customer-managed-keys) - Documentation for using Azure Key Vault with managed disks, which is a similar scenario to Azure HPC Cache
140145

141146
## Next steps
142147

articles/hpc-cache/hpc-cache-add-storage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ description: How to define storage targets so that your Azure HPC Cache can use
44
author: ekpgh
55
ms.service: hpc-cache
66
ms.topic: conceptual
7-
ms.date: 04/03/2020
8-
ms.author: rohogue
7+
ms.date: 04/23/2020
8+
ms.author: v-erkel
99
---
1010

1111
# Add storage targets

articles/hpc-cache/hpc-cache-create.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: How to create an Azure HPC Cache instance
44
author: ekpgh
55
ms.service: hpc-cache
66
ms.topic: how-to
7-
ms.date: 04/15/2020
7+
ms.date: 04/23/2020
88
ms.author: v-erkel
99
---
1010

@@ -24,7 +24,7 @@ In **Service Details**, set the cache name and these other attributes:
2424

2525
* Location - Select one of the [supported regions](hpc-cache-overview.md#region-availability).
2626
* Virtual network - You can select an existing one or create a new virtual network.
27-
* Subnet - Choose or create a subnet with at least 64 IP addresses (/24) that will be used only for this Azure HPC Cache instance.
27+
* Subnet - Choose or create a subnet with at least 64 IP addresses (/24). This subnet must be used only for this Azure HPC Cache instance.
2828

2929
## Set cache capacity
3030
<!-- referenced from GUI - update aka.ms link if you change this header text -->
@@ -40,15 +40,15 @@ Choose the capacity by setting these two values:
4040

4141
Choose one of the available throughput values and cache storage sizes.
4242

43-
Keep in mind that the actual data transfer rate depends on workload, network speeds, and the type of storage targets. The values you choose set the maximum throughput for the entire cache system, but some of that is used for overhead tasks. For example, if a client requests a file that isn't already stored in the cache, or if the file is marked as stale, your cache uses some of its throughput to fetch it from backend storage.
43+
Keep in mind that the actual data transfer rate depends on workload, network speeds, and the type of storage targets. The values you choose set the maximum throughput for the entire cache system, but some of that is used for overhead tasks. For example, if a client requests a file that isn't already stored in the cache, or if the file is marked as stale, your cache uses some of its throughput to fetch it from back-end storage.
4444

45-
Azure HPC Cache manages which files are cached and preloaded to maximize cache hit rates. The cache contents are continuously assessed and files are moved to long-term storage when they are less frequently accessed. Choose a cache storage size that can comfortably hold the active set of working files with additional space for metadata and other overhead.
45+
Azure HPC Cache manages which files are cached and preloaded to maximize cache hit rates. Cache contents are continuously assessed, and files are moved to long-term storage when they're less frequently accessed. Choose a cache storage size that can comfortably hold the active set of working files, plus additional space for metadata and other overhead.
4646

4747
![screenshot of cache sizing page](media/hpc-cache-create-capacity.png)
4848

4949
## Enable Azure Key Vault encryption (optional)
5050

51-
If your cache is in a region that supports customer-managed encryption keys, the **Disk encryption keys** page appears between the **Cache** and **Tags** tabs. As of publication time, this option is supported in East US, South Central US, and West US 2.
51+
If your cache is in a region that supports customer-managed encryption keys, the **Disk encryption keys** page appears between the **Cache** and **Tags** tabs. At publication time, this option is supported in East US, South Central US, and West US 2.
5252

5353
If you want to manage the encryption keys used with your cache storage, supply your Azure Key Vault information on the **Disk encryption keys** page. The key vault must be in the same region and in the same subscription as the cache.
5454

articles/hpc-cache/hpc-cache-ingest-manual.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ After issuing this command, the `jobs` command will show that two threads are ru
3030

3131
## Copy data with predictable file names
3232

33-
If your file names are predictable, you can use expressions to create parallel copy threads.
33+
If your file names are predictable, you can use expressions to create parallel copy threads.
3434

35-
For example, if your directory contains 1000 files that are numbered sequentially from `0001` to `1000`, you can use the following expressions to create ten parallel threads that each copy 100 files:
35+
For example, if your directory contains 1000 files that are numbered sequentially from `0001` to `1000`, you can use the following expressions to create 10 parallel threads that each copy 100 files:
3636

3737
```bash
3838
cp /mnt/source/file0* /mnt/destination1/ & \
@@ -49,7 +49,7 @@ cp /mnt/source/file9* /mnt/destination1/
4949

5050
## Copy data with unstructured file names
5151

52-
If your file naming structure is not predictable, you can group files by directory names.
52+
If your file naming structure is not predictable, you can group files by directory names.
5353

5454
This example collects entire directories to send to ``cp`` commands run as background tasks:
5555

@@ -67,16 +67,16 @@ After the files are collected, you can run parallel copy commands to recursively
6767

6868
```bash
6969
cp /mnt/source/* /mnt/destination/
70-
mkdir -p /mnt/destination/dir1 && cp /mnt/source/dir1/* mnt/destination/dir1/ &
71-
cp -R /mnt/source/dir1/dir1a /mnt/destination/dir1/ &
72-
cp -R /mnt/source/dir1/dir1b /mnt/destination/dir1/ &
70+
mkdir -p /mnt/destination/dir1 && cp /mnt/source/dir1/* mnt/destination/dir1/ &
71+
cp -R /mnt/source/dir1/dir1a /mnt/destination/dir1/ &
72+
cp -R /mnt/source/dir1/dir1b /mnt/destination/dir1/ &
7373
cp -R /mnt/source/dir1/dir1c /mnt/destination/dir1/ & # this command copies dir1c1 via recursion
7474
cp -R /mnt/source/dir1/dir1d /mnt/destination/dir1/ &
7575
```
7676

7777
## When to add mount points
7878

79-
After you have enough parallel threads going against a single destination file system mount point, there will be a point where adding more threads does not give more throughput. (Throughput will be measured in files/second or bytes/second, depending on your type of data.) Or worse, over-threading can sometimes cause a throughput degradation.
79+
After you have enough parallel threads going against a single destination file system mount point, there will be a point where adding more threads does not give more throughput. (Throughput will be measured in files/second or bytes/second, depending on your type of data.) Or worse, over-threading can sometimes cause a throughput degradation.
8080

8181
When this happens, you can add client-side mount points to other Azure HPC Cache mount addresses, using the same remote file system mount path:
8282

@@ -87,7 +87,7 @@ When this happens, you can add client-side mount points to other Azure HPC Cache
8787
10.1.1.103:/nfs on /mnt/destination3type nfs (rw,vers=3,proto=tcp,addr=10.1.1.103)
8888
```
8989

90-
Adding client-side mount points lets you fork off additional copy commands to the additional `/mnt/destination[1-3]` mount points, achieving further parallelism.
90+
Adding client-side mount points lets you fork off additional copy commands to the additional `/mnt/destination[1-3]` mount points, achieving further parallelism.
9191

9292
For example, if your files are very large, you might define the copy commands to use distinct destination paths, sending out more commands in parallel from the client performing the copy.
9393

@@ -107,7 +107,7 @@ In the example above, all three destination mount points are being targeted by t
107107

108108
## When to add clients
109109

110-
Lastly, when you have reached the client's capabilities, adding more copy threads or additional mount points will not yield any additional files/sec or bytes/sec increases. In that situation, you can deploy another client with the same set of mount points that will be running its own sets of file copy processes.
110+
Lastly, when you have reached the client's capabilities, adding more copy threads or additional mount points will not yield any additional files/sec or bytes/sec increases. In that situation, you can deploy another client with the same set of mount points that will be running its own sets of file copy processes.
111111

112112
Example:
113113

@@ -153,7 +153,7 @@ Redirect this result to a file: `find . -mindepth 4 -maxdepth 4 -type d > /tmp/f
153153
Then you can iterate through the manifest, using BASH commands to count files and determine the sizes of the subdirectories:
154154

155155
```bash
156-
ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `find ${i} |wc -l` `du -sh ${i}`"; done
156+
ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `find ${i} |wc -l` `du -sh ${i}`"; done
157157
244 3.5M ./atj5b5ab44b7f-02/support/gsi/2018-07-18T00:07:03EDT
158158
9 172K ./atj5b5ab44b7f-02/support/gsi/stats_2018-07-18T05:01:00UTC
159159
124 5.8M ./atj5b5ab44b7f-02/support/gsi/stats_2018-07-19T01:01:01UTC
@@ -189,7 +189,7 @@ ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `f
189189
33 2.8G ./atj5b5ab44b7f-03/support/trace/rolling
190190
```
191191

192-
Lastly, you must craft the actual file copy commands to the clients.
192+
Lastly, you must craft the actual file copy commands to the clients.
193193

194194
If you have four clients, use this command:
195195

@@ -209,14 +209,14 @@ And for six.... Extrapolate as needed.
209209
for i in 1 2 3 4 5 6; do sed -n ${i}~6p /tmp/foo > /tmp/client${i}; done
210210
```
211211

212-
You will get *N* resulting files, one for each of your *N* clients that has the path names to the level-four directories obtained as part of the output from the `find` command.
212+
You will get *N* resulting files, one for each of your *N* clients that has the path names to the level-four directories obtained as part of the output from the `find` command.
213213

214214
Use each file to build the copy command:
215215

216216
```bash
217217
for i in 1 2 3 4 5 6; do for j in $(cat /tmp/client${i}); do echo "cp -p -R /mnt/source/${j} /mnt/destination/${j}" >> /tmp/client${i}_copy_commands ; done; done
218218
```
219219

220-
The above will give you *N* files, each with a copy command per line, that can be run as a BASH script on the client.
220+
The above will give you *N* files, each with a copy command per line, that can be run as a BASH script on the client.
221221

222222
The goal is to run multiple threads of these scripts concurrently per client in parallel on multiple clients.

0 commit comments

Comments
 (0)