Skip to content

Commit 660f89f

Browse files
authored
Merge pull request #25 from johncdawson/patch-6
Update troubleshoot.md
2 parents 58d7fa5 + 81e021b commit 660f89f

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

articles/dedicated-hsm/troubleshoot.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Dedicated HSM is not freely available for use as it is delivering hardware resou
3131

3232
### Getting access to Dedicated HSM
3333

34-
If you believe Dedicated HSM will fit your key storage requirements then please email [email protected] to request access. Please outline your application, the regions you would like HSMs and the volume of HSMs you are looking for. If you work with a Microsoft representative, such as an Account Executive or Cloud Solution Architect for example, then please include them in any request.
34+
If you believe Dedicated HSM will fit your key storage requirements, then email [email protected] to request access. Outline your application, the regions you would like HSMs and the volume of HSMs you are looking for. If you work with a Microsoft representative, such as an Account Executive or Cloud Solution Architect for example, then include them in any request.
3535

3636
## HSM Provisioning
3737

@@ -44,30 +44,30 @@ Dedicated HSM supports CLI and PowerShell for deployment so portal based error i
4444
![Failure Information](./media/troubleshoot/failure-information.png)
4545

4646
### HSM Subnet Delegation
47-
The number one reason for deployment failures is forgetting to set the appropriate delegation for the customer defined subnet on which the HSMs will be provisioned. This is part of the VNet and subnet prerequisites for deployment and more details can be found in the tutorials.
47+
The number one reason for deployment failures is forgetting to set the appropriate delegation for the customer defined subnet on which the HSMs will be provisioned. Setting that delegation is part of the VNet and subnet prerequisites for deployment and more details can be found in the tutorials.
4848

4949
![Subnet Delegation](./media/troubleshoot/subnet-delegation.png)
5050

5151
### HSM Deployment Race Condition
5252

53-
The standard ARM template provided for deployment has HSM and ExpressRoute gateway related resources. Networking resources are a dependency for successful HSM deployment and timing can be crucial. On some occasions we have seen deployment failures related to dependency issues and re-running the deployment is often successful. If not, deleting resources and then re-deploying is often successful. After attempting this and still finding issue, raise a support request in the Azure portal selecting the problem type of "Issues configuring the Azure setup".
53+
The standard ARM template provided for deployment has HSM and ExpressRoute gateway related resources. Networking resources are a dependency for successful HSM deployment and timing can be crucial. Occasionally, we have seen deployment failures related to dependency issues and rerunning the deployment often solves the issue. If not, deleting resources and then redeploying is often successful. After attempting this and still finding issue, raise a support request in the Azure portal selecting the problem type of "Issues configuring the Azure setup".
5454

5555
### HSM Deployment Using Terraform
5656

57-
A few customers have used Terraform as an automation environment instead of ARM templates as supplied when registering for this service. The HSMs themselves cannot be deployed this way but the dependent networking resources can. Terraform has a module to call out to a minimal ARM template that jut has the HSM deployment. In this situation, care should be taken to ensure networking resources such as the required ExpressRoute Gateway are fully deployed before deploying HSMs. The following CLI command can be used to test for completed deployment and integrated as required. Simply replace the angle bracket place holders for your specific naming. You will look for a result of "provisioningState is Succeeded"
57+
A few customers have used Terraform as an automation environment instead of ARM templates as supplied when registering for this service. The HSMs cannot be deployed this way but the dependent networking resources can. Terraform has a module to call out to a minimal ARM template that jut has the HSM deployment. In this situation, care should be taken to ensure networking resources such as the required ExpressRoute Gateway are fully deployed before deploying HSMs. The following CLI command can be used to test for completed deployment and integrated as required. Replace the angle bracket place holders for your specific naming. You should look for a result of "provisioningState is Succeeded"
5858

5959
```azurecli
6060
az resource show --ids /subscriptions/<subid>/resourceGroups/<myresourcegroup>/providers/Microsoft.Network/virtualNetworkGateways/<myergateway>
6161
```
6262

6363
### Deployment failure based on quota
64-
Deployments can fail if you exceed 2 HSMs per stamp and 4 HSMs per region. To avoid this situation ensure you have deleted resources from previously failed deployments before deploying again. Refer to the "How do I see HSMs" item below to check resources. If you believe you need to exceed this quota, which is primarily there as a safeguard, then please email [email protected] with details.
64+
Deployments can fail if you exceed 2 HSMs per stamp and 4 HSMs per region. To avoid this situation, ensure you have deleted resources from previously failed deployments before deploying again. Refer to the "How do I see HSMs" item below to check resources. If you believe you need to exceed this quota, which is primarily there as a safeguard, then please email [email protected] with details.
6565

6666
### Deployment failure based on capacity
67-
When a particular stamp or region is becoming full, i.e. nearly all free HSMs provisioned, this can lead to deployment failures. Each stamp has 11 HSMs available for customers which means 22 per region. There are also 3 spares and 1 test device in each stamp. If you believe you may have hit a limit then please email [email protected] for information on fill-level of specific stamps.
67+
When a particular stamp or region is becoming full, that is, nearly all free HSMs are provisioned, this can lead to deployment failures. Each stamp has 11 HSMs available for customers, which means 22 per region. There are also 3 spares and 1 test device in each stamp. If you believe you may have hit a limit, then email [email protected] for information on fill-level of specific stamps.
6868

6969
### How do I see HSMs when provisioned?
70-
Due to Dedicated HSM being a whitelisted service it is considered a "Hidden Type" in the portal. To see the HSM resources you must check to "Show hidden types" check box as shown below. The NIC resource always the follows the HSM and is a good place to find out the IP address of the HSM prior to using SSH to connect.
70+
Due to Dedicated HSM being a whitelisted service, it is considered a "Hidden Type" in the Azure portal. To see the HSM resources, you must check the "Show hidden types" check box as shown below. The NIC resource always follows the HSM and is a good place to find out the IP address of the HSM prior to using SSH to connect.
7171

7272
![Subnet Delegation](./media/troubleshoot/hsm-provisioned.png)
7373

@@ -77,7 +77,7 @@ Deployment of Dedicated HSM has a dependency on networking resources and some co
7777

7878
### Provisioning ExpressRoute
7979

80-
Dedicated HSM uses ExpressRoute Gateway as a "tunnel" for communication between the customer private IP address space and the physical HSM in an Azure datacenter. Considering there is a restriction on one gateway per Vnet, this means customers requiring connection to their on-premises resources via ExpressRoute, will have to use another Vnet for that connection.
80+
Dedicated HSM uses ExpressRoute Gateway as a "tunnel" for communication between the customer private IP address space and the physical HSM in an Azure datacenter. Considering there is a restriction of one gateway per Vnet, customers requiring connection to their on-premises resources via ExpressRoute, will have to use another Vnet for that connection.
8181

8282
### HSM Private IP Address
8383

@@ -87,7 +87,7 @@ The sample templates provided for Dedicated HSM assume the HSM IP will be automa
8787

8888
## HSM Initialization
8989

90-
Initialization prepares a new HSM for use, or an existing HSM for reuse. You must initialize the HSM before you can generate or store objects, allow clients to connect, or perform cryptographic operations.
90+
Initialization prepares a new HSM for use, or an existing HSM for reuse. Initialization of the HSM must be complete before you can generate or store objects, allow clients to connect, or perform cryptographic operations.
9191

9292
### Lost Credentials
9393

@@ -103,22 +103,22 @@ Providing incorrect credentials to HSMs can have destructive consequences. The f
103103
| HSM SO | 3 | HSM is zeroized (all HSM objects identities, and all partitions are gone) | HSM must be reinitialized. Contents can be restored from backup(s). |
104104
| Partition SO | 10 | Partition is zeroized. | Partition must be reinitialized. Contents can be restored from backup. |
105105
| Audit | 10 | Lockout | Unlocked automatically after 10 minutes. |
106-
| Crypto Officer | 10 (can be decreased) | If HSM policy 15: Enable SO reset of partition PIN is set to 1 (enabled), the CO and CU roles are locked out.<br>If HSM policy 15: Enable SO reset of partition PIN is set to 0 (disabled), the CO and CU roles are permanently locked out and the partition contents are no longer accessible. This is the default setting. | CO role must be unlocked and the credential reset by the Partition SO, using `role resetpw -name co`.<br>The partition must be re-initialized, and key material restored from a backup device. |
106+
| Crypto Officer | 10 (can be decreased) | If HSM policy 15: Enable SO reset of partition PIN is set to 1 (enabled), the CO and CU roles are locked out.<br>If HSM policy 15: Enable SO reset of partition PIN is set to 0 (disabled), the CO and CU roles are permanently locked out and the partition contents are no longer accessible. This is the default setting. | CO role must be unlocked and the credential reset by the Partition SO, using `role resetpw -name co`.<br>The partition must be reinitialized, and key material restored from a backup device. |
107107

108108
## HSM Configuration
109109

110110
The following items are situation where configuration errors are either common or have an impact that is worthy of calling out:
111111

112112
### HSM Documentation and Software
113-
Software and documentation for the Thales SafeNet Luna 7 HSM devices is not available from Microsoft and must be downloaded from Thales directly. This requires registration using a Thales Customer ID you received during the registration process. The devices as provided by Microsoft have software version 7.2 and firmware version 7.0.3. Early in 2020 Thales made documentation public and it can be found [here](https://thalesdocs.com/gphsm/luna/7.2/docs/network/Content/Home_network.htm).
113+
Software and documentation for the Thales SafeNet Luna 7 HSM devices is not available from Microsoft and must be downloaded from Thales directly. Registration is required using the Thales Customer ID received during the registration process. The devices, as provided by Microsoft, have software version 7.2 and firmware version 7.0.3. Early in 2020 Thales made documentation public and it can be found [here](https://thalesdocs.com/gphsm/luna/7.2/docs/network/Content/Home_network.htm).
114114

115115
### HSM Networking Configuration
116116

117-
Be careful when configuring the networking within the HSM. The HSM has a connection via the ExpressRoute Gateway from a customer private IP address space directly to the HSM. This communication channel is for customer communication only and Microsoft has no access. If the HSM is configured in a such a way that this network path is impacted that means all communication with the HSM is removed. In this situation the only option is to raise a Microsoft support request via the Azure portal to have the device reset. This reset procedure sets the HSM back to its initial state and all configuration and key material is lost. Configuration must be recreated and when the device joins the HA group it will get key material replicated.
117+
Be careful when configuring the networking within the HSM. The HSM has a connection via the ExpressRoute Gateway from a customer private IP address space directly to the HSM. This communication channel is for customer communication only and Microsoft has no access. If the HSM is configured in a such a way that this network path is impacted, that means all communication with the HSM is removed. In this situation, the only option is to raise a Microsoft support request via the Azure portal to have the device reset. This reset procedure sets the HSM back to its initial state and all configuration and key material is lost. Configuration must be recreated and when the device joins the HA group it will get key material replicated.
118118

119119
### HSM Device Reboot
120120

121-
Some configuration changes require the HSM to be power cycled or rebooted. Microsoft testing of the HSM in Azure determined that on some occasions the reboot could hang. Considering this is in an Azure datacenter, the implication is that a support request must be created in the Azure portal requesting hard-reboot and that could take up to 48 hours to complete. To avoid this situation ensure you have deployed the reboot patch available from Thales directly. Please refer to [KB0019789](https://supportportal.gemalto.com/csm?sys_kb_id=d66911e2db4ffbc0d298728dae9619b0&id=kb_article_view&sysparm_rank=1&sysparm_tsqueryId=d568c35bdb9a4850d6b31f3b4b96199e&sysparm_article=KB0019789) in the Thales Luna Network HSM 7.2 Downloads for a recommended patch for a reboot hang issue (Note: you will need to have registered in the Thales support portal to download).
121+
Some configuration changes require the HSM to be power cycled or rebooted. Microsoft testing of the HSM in Azure determined that on some occasions the reboot could hang. The implication is that a support request must be created in the Azure portal requesting hard-reboot and that could take up to 48 hours to complete considering it's a manual process in an Azure datacenter. To avoid this situation, ensure you have deployed the reboot patch available from Thales directly. Refer to [KB0019789](https://supportportal.gemalto.com/csm?sys_kb_id=d66911e2db4ffbc0d298728dae9619b0&id=kb_article_view&sysparm_rank=1&sysparm_tsqueryId=d568c35bdb9a4850d6b31f3b4b96199e&sysparm_article=KB0019789) in the Thales Luna Network HSM 7.2 Downloads for a recommended patch for a reboot hang issue (Note: you will need to have registered in the Thales support portal to download).
122122

123123
### NTLS Certificates out of sync
124124
A client may lose connectivity to an HSM when a certificate expires or has been overwritten through configuration updates. The certificate exchange client configuration should be reapplied with each HSM.
@@ -132,7 +132,7 @@ Example NTLS logging with invalid certificate:
132132
133133
### Failed TCP Communication
134134

135-
Communication from the Luna Client installation to the HSM requires at a minimum TCP port 1792. This should be taken into consideration as any network configurations are changed in the environment.
135+
Communication from the Luna Client installation to the HSM requires at a minimum TCP port 1792. Consider this as any network configurations are changed in the environment.
136136

137137
### Failed HA Group Member Doesn't Recover
138138

@@ -142,17 +142,17 @@ It is necessary to configure a retry count for an HA group to enable auto recove
142142
### HA Group Doesn't Sync
143143

144144
In the case where member partitions do not have the same cloning domain, the ha synchronize command will display the following:
145-
Warning: Synchronize may fail. The members in slot 0 and slot 1 have conflictingsettings for private key cloning.
145+
Warning: Synchronize may fail. The members in slot 0 and slot 1 have conflicting settings for private key cloning.
146146
A new partition with the correct cloning domain should be added to the HA group, followed by removing the incorrectly configured partition.
147147

148148
## HSM Deprovisioning
149149

150-
Only when fully finished with an HSM can it be deprovisioned and then Microsoft will reset it and return it to a freepool.
150+
Only when fully finished with an HSM can it be deprovisioned and then Microsoft will reset it and return it to a free pool.
151151

152152
### How to delete an HSM resource
153153

154-
HSMs cannot be deleted as Azure resources unless they are in a "zeroized" state. This means all key material must have been deleted prior to trying to delete it as a resource. The quickest way to zeroize is to get the HSM admin password wrong 3 times (note: this is HSM admin and not appliance level admin). The Luna shell does have a `hsm -factoryreset` command that zeroizes but this can only be executed via console on the serial port and customers do not have access to this.
154+
The Azure resource for an HSM cannot be deleted unless the HSM is in a "zeroized" state. Hence, all key material must have been deleted prior to trying to delete it as a resource. The quickest way to zeroize is to get the HSM admin password wrong 3 times (note: this refers to the HSM admin and not appliance level admin). The Luna shell does have a `hsm -factoryreset` command that zeroizes but it can only be executed via console on the serial port and customers do not have access to this.
155155

156156
## Next steps
157157

158-
This article has provided insight into areas across the HSM deployment lifecycle that may have issues or require troubleshooting or careful consideration. Hopefully this helps you avoids unnecessary delays and frustration, and if you have relevant additions or changes to this article then please raise a support request with Microsoft and let us know.
158+
This article has provided insight into areas across the HSM deployment lifecycle that may have issues or require troubleshooting or careful consideration. Hopefully this article helps you avoid unnecessary delays and frustration, and if you have relevant additions or changes, then raise a support request with Microsoft and let us know.

0 commit comments

Comments
 (0)