Skip to content

Commit 39b8bce

Browse files
authored
Update troubleshoot.md
1 parent 844cab9 commit 39b8bce

File tree

1 file changed

+15
-19
lines changed

1 file changed

+15
-19
lines changed

articles/dedicated-hsm/troubleshoot.md

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,26 +20,26 @@ ms.author: mbaldwin
2020
---
2121
# Troubleshooting
2222

23-
The Azure Dedicated HSM service has two distinct facets. Firstly, the registration and deployment in Azure of the HSM devices and the underlying network components. Secondly, the configuration of the HSM devices in preparation for use/integration with a given workload or application. Although the Thales Luna Network HSM devices are the same in Azure as you would purchase directly from Thales, the fact they are a resource in Azure creates some unique considerations. These considerations and any resulting troubleshooting insights or best practices, are documented here to ensure high visibility and access to critical information. Once the service is in use, definitive information is available via support requests to either Microsoft or Thales directly.
23+
The Azure Dedicated HSM service has two distinct facets. Firstly, the registration and deployment in Azure of the HSM devices with their underlying network components. Secondly, the configuration of the HSM devices in preparation for use/integration with a given workload or application. Although the Thales Luna Network HSM devices are the same in Azure as you would purchase directly from Thales, the fact they are a resource in Azure creates some unique considerations. These considerations and any resulting troubleshooting insights or best practices, are documented here to ensure high visibility and access to critical information. Once the service is in use, definitive information is available via support requests to either Microsoft or Thales directly.
2424

2525
> [!NOTE]
2626
> It should be noted that prior to performing any configuration on a newly deployed HSM device, it should be updated with any relevant patches. A specific required patch is [KB0019789](https://supportportal.gemalto.com/csm?id=kb_article_view&sys_kb_id=19a81c8bdb9a1fc8d298728dae96197d&sysparm_article=KB0019789) in Thales support portal which addresses a reboot hang issue.
2727
2828
## HSM Registration
2929

30-
Dedicated HSM is not freely available for use as it is delivering hardware resources in the cloud and hence is a precious resource that need protecting. We therefore use a whitelisting process.
30+
Dedicated HSM is not freely available for use as it is delivering hardware resources in the cloud and hence is a precious resource that needs protecting. We therefore use a whitelisting process via email using [email protected].
3131

3232
### Getting access to Dedicated HSM
3333

34-
If you believe Dedicated HSM will fit you key storage requirements then please email [email protected] to request access. Please outline your application, the regions you would like HSMs and the volume of HSMs you are looking for. If you work with a Microsoft representative, such as an Account Executive or Cloud Solution Architect for example, then please include them in any request.
34+
If you believe Dedicated HSM will fit your key storage requirements then please email [email protected] to request access. Please outline your application, the regions you would like HSMs and the volume of HSMs you are looking for. If you work with a Microsoft representative, such as an Account Executive or Cloud Solution Architect for example, then please include them in any request.
3535

3636
## HSM Provisioning
3737

3838
Provisioning an HSM device in Azure can be done via either CLI or PowerShell. When registering for the service, a sample ARM template will be provided and assistance will be given for initial customization.
3939

4040
### HSM Deployment Failure Information
4141

42-
Dedicated HSM support CLI and PowerShell for deployment so portal based error information is limited and not verbose. Better information can be found by using the Resource Explorer. The home page has an icon for this and more detailed error information is available. This is information helps a lot if pasted in when creating a support request related to deployment issues.
42+
Dedicated HSM supports CLI and PowerShell for deployment so portal based error information is limited and not verbose. Better information can be found by using the Resource Explorer. The portal home page has an icon for this and more detailed error information is available. This information helps a lot if pasted in when creating a support request related to deployment issues.
4343

4444
![Failure Information](./media/troubleshoot/failure-information.png)
4545

@@ -50,11 +50,11 @@ The number one reason for deployment failures is forgetting to set the appropria
5050

5151
### HSM Deployment Race Condition
5252

53-
The standard ARM template provided for deployment has HSM and ExpressRoute gateway related resources. Networking resources are a dependency for successful HSM deployment and timing can be crucial. On some occasions we have seen deployment failures related to dependency issues and re-running the deployment is often successful. If not, deleting resources and then re-deploying is often successful. After attempting this and still finding issue then raise a support request in the Azure portal selecting the problem type of "Issues configuring the Azure setup".
53+
The standard ARM template provided for deployment has HSM and ExpressRoute gateway related resources. Networking resources are a dependency for successful HSM deployment and timing can be crucial. On some occasions we have seen deployment failures related to dependency issues and re-running the deployment is often successful. If not, deleting resources and then re-deploying is often successful. After attempting this and still finding issue, raise a support request in the Azure portal selecting the problem type of "Issues configuring the Azure setup".
5454

5555
### HSM Deployment Using Terraform
5656

57-
A few customers have used Terraform as an automation environment instead of ARM templates as supplied when registering for this service. The HSMs themselves cannot be deployed this way but the dependent networking resources can. Terraform has a module to call out to a minimal ARM template that jut has the HSM deployment. In this situation care should be taken to ensure networking resources such as the required ExpressRoute Gateway are fully deployed before deploying HSMs. The following CLI command can be used to test for completed deployment and integrated as required. Simply replace the angle bracket place holders for your specific naming. You are looking for a result of "provisioningState is Succeeded"
57+
A few customers have used Terraform as an automation environment instead of ARM templates as supplied when registering for this service. The HSMs themselves cannot be deployed this way but the dependent networking resources can. Terraform has a module to call out to a minimal ARM template that jut has the HSM deployment. In this situation, care should be taken to ensure networking resources such as the required ExpressRoute Gateway are fully deployed before deploying HSMs. The following CLI command can be used to test for completed deployment and integrated as required. Simply replace the angle bracket place holders for your specific naming. You will look for a result of "provisioningState is Succeeded"
5858

5959
```azurecli
6060
az resource show --ids /subscriptions/<subid>/resourceGroups/<myresourcegroup>/providers/Microsoft.Network/virtualNetworkGateways/<myergateway>
@@ -67,7 +67,7 @@ Deployments can fail if you exceed 2 HSMs per stamp and 4 HSMs per region. To a
6767
When a particular stamp or region is becoming full, i.e. nearly all free HSMs provisioned, this can lead to deployment failures. Each stamp has 11 HSMs available for customers which means 22 per region. There are also 3 spares and 1 test device in each stamp. If you believe you may have hit a limit then please email [email protected] for information on fill-level of specific stamps.
6868

6969
### How do I see HSMs when provisioned?
70-
This question has been asked a lot. Due to Dedicated HSM being a whitelisted service it is considered a "Hidden Type" in the portal. To see the HSM resources you must check to "Show hidden types" check box as shown below. The NIC resource always the follows the HSM and is a good place to find out the IP address of the HSM prior to using SSH to connect.
70+
Due to Dedicated HSM being a whitelisted service it is considered a "Hidden Type" in the portal. To see the HSM resources you must check to "Show hidden types" check box as shown below. The NIC resource always the follows the HSM and is a good place to find out the IP address of the HSM prior to using SSH to connect.
7171

7272
![Subnet Delegation](./media/troubleshoot/hsm-provisioned.png)
7373

@@ -92,7 +92,6 @@ Initialization prepares a new HSM for use, or an existing HSM for reuse. You mus
9292
### Lost Credentials
9393

9494
Loss of the Shell administrator password will result in loss of HSM key material. A support request should be made to reset the HSM.
95-
9695
When initializing the HSM, securely store credentials. Shell and HSM credentials should be kept in accordance with your company's policies.
9796

9897
### Failed Logins
@@ -111,11 +110,11 @@ Providing incorrect credentials to HSMs can have destructive consequences. The f
111110
The following items are situation where configuration errors are either common or have an impact that is worthy of calling out:
112111

113112
### HSM Documentation and Software
114-
Software and documentation for the Thales SafeNet Luna 7 HSM devices is not available from Microsoft and must be downloaded from Thales directly. This requires registration using a Thales Customer ID you received during the registration process. The devices as delivered by Microsoft have software version 7.2 and firmware version 7.0.3. Early in 2020 Thales made documentation public and it can be found [here](https://thalesdocs.com/gphsm/luna/7.2/docs/network/Content/Home_network.htm).
113+
Software and documentation for the Thales SafeNet Luna 7 HSM devices is not available from Microsoft and must be downloaded from Thales directly. This requires registration using a Thales Customer ID you received during the registration process. The devices as provided by Microsoft have software version 7.2 and firmware version 7.0.3. Early in 2020 Thales made documentation public and it can be found [here](https://thalesdocs.com/gphsm/luna/7.2/docs/network/Content/Home_network.htm).
115114

116115
### HSM Networking Configuration
117116

118-
Be careful when configuring the networking within the HSM. The HSM has a connection via the ExpressRoute Gateway from a customer private IP address space directly to the HSM. This communication channel is for customer communication only and Microsoft has no access. If the HSM is configured in a such a way that this network path is impact that means all communication with the HSM is removed. In this situation the only option is to raise a Microsoft support request via the Azure portal to have the device reset. This reset procedure sets the HSM back to its initial state and all configuration and key material is lost. Configuration must be recreated and when the device joins the HA group it will get key material replicated.
117+
Be careful when configuring the networking within the HSM. The HSM has a connection via the ExpressRoute Gateway from a customer private IP address space directly to the HSM. This communication channel is for customer communication only and Microsoft has no access. If the HSM is configured in a such a way that this network path is impacted that means all communication with the HSM is removed. In this situation the only option is to raise a Microsoft support request via the Azure portal to have the device reset. This reset procedure sets the HSM back to its initial state and all configuration and key material is lost. Configuration must be recreated and when the device joins the HA group it will get key material replicated.
119118

120119
### HSM Device Reboot
121120

@@ -137,26 +136,23 @@ Communication from the Luna Client installation to the HSM requires at a minimum
137136

138137
### Failed HA Group Member Doesn't Recover
139138

140-
If a failed HS Group member doesn't recover, it must be manually recovered from the Luna client using the command hagroup recover.
141-
139+
If a failed HA Group member doesn't recover, it must be manually recovered from the Luna client using the command hagroup recover.
142140
It is necessary to configure a retry count for an HA group to enable auto recover. By default an HA group will not attempt to recover an HA member into the group when it recovers.
143141

144142
### HA Group Doesn't Sync
145143

146144
In the case where member partitions do not have the same cloning domain, the ha synchronize command will display the following:
147-
Warning: Synchronize may fail. The members in slot 0 and slot 1 have conflicting
148-
149-
settings for private key cloning.
150-
145+
Warning: Synchronize may fail. The members in slot 0 and slot 1 have conflictingsettings for private key cloning.
151146
A new partition with the correct cloning domain should be added to the HA group, followed by removing the incorrectly configured partition.
152147

153148
## HSM Deprovisioning
154149

155150
Only when fully finished with an HSM can it be deprovisioned and then Microsoft will reset it and return it to a freepool.
156-
How to delete an HSM resource
157151

158-
HSMs cannot be deleted as Azure resources unless they in a "zeroized" state. This means all key material must have been deleted prior to trying to delete it as a resource. The quickest way to delete is to get the HSM admin password wrong 3 times (note: this is HSM admin and not appliance level admin). The Luna shell does have a `hsm -factoryreset` command that zeroizes but this can only be executed via console on the serial port.
152+
### How to delete an HSM resource
153+
154+
HSMs cannot be deleted as Azure resources unless they are in a "zeroized" state. This means all key material must have been deleted prior to trying to delete it as a resource. The quickest way to zeroize is to get the HSM admin password wrong 3 times (note: this is HSM admin and not appliance level admin). The Luna shell does have a `hsm -factoryreset` command that zeroizes but this can only be executed via console on the serial port and customers do not have access to this.
159155

160-
##Next steps
156+
## Next steps
161157

162158
This article has provided insight into areas across the HSM deployment lifecycle that may have issues or require troubleshooting or careful consideration. Hopefully this helps you avoids unnecessary delays and frustration, and if you have relevant additions or changes to this article then please raise a support request with Microsoft and let us know.

0 commit comments

Comments
 (0)