You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/howto-run-instance-readiness-testing.md
+69-90Lines changed: 69 additions & 90 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,3 @@
1
-
---
2
-
title: "Azure Operator Nexus: How to run Instance Readiness Testing"
3
-
description: Learn how to run instance readiness testing.
4
-
author: lesage-oded
5
-
ms.author: odedlesage
6
-
ms.service: azure-operator-nexus
7
-
ms.topic: how-to
8
-
ms.date: 02/29/2024
9
-
ms.custom: template-how-to
10
-
---
11
-
12
1
# Azure Operator Nexus Instance Readiness Test (IRT)
13
2
14
3
The Instance Readiness Test (IRT) framework is an optional/add-on tool for the Nexus platform. It enables operators to verify the successful deployment and readiness of the Azure Operator Nexus instance for workload deployment. This verification applies to both initial deployment and subsequent upgrades of the Nexus. It runs a series of tests and provides the test results as an html report.
@@ -27,10 +16,25 @@ The Instance Readiness Test (IRT) framework is an optional/add-on tool for the N
27
16
## Tests executed with IRT
28
17
- Validate that l3 domains in the fabric subscription and resource group exist after all tests on the resources under test are done.
29
18
- Validate that there are l3 networks created in the testing resource group after all tests on the resources under test are done.
19
+
- Validate that ApiserverAuditRequestsRejectedTotal metric data is present within the last 10 minutes.
20
+
Every average metric should be greater than 0.
21
+
- Validate that ContainerMemoryUsageBytes metric data is present within the last 10 minutes.
22
+
Every average metric should be greater than 0.
23
+
- Validate that CorednsDnsRequestsTotal metric data is present within the last 10 minutes.
24
+
Every average metric should be greater than 0.
25
+
- Validate that EtcdServerIsLeader metric data is present within the last 10 minutes. Every count metric should be greater than 0.
30
26
- Validate that FelixClusterNumHosts metric data is present within the last 10 minutes.
31
27
Every average metric should be greater than 0.
28
+
- Validate that IdracPowerOn metric data is present within the last 10 minutes. Every count metric should be greater than 0.
29
+
- Validate that KubeDaemonsetStatusCurrentNumberScheduled metric data is present within the last 10 minutes. Every average metric should be greater than 0.
30
+
- Validate that KubeletRunningPods metric data is present within the last 10 minutes.
31
+
Every average metric should be greater than 0.
32
+
- Validate that KubevirtInfo metric data is present within the last 10 minutes.
33
+
Every average metric should be greater than 0.
32
34
- Validate that NodeOsInfo metric data for a baremetal machine is present within the last 10 minutes.
33
35
Every count metric should be greater than 0.
36
+
- Validate that TyphaConnectionsAccepted metric data is present within the last 10 minutes.
37
+
Every average metric should be greater than 0.
34
38
- Test the transmission of IPv4 TCP data between two virtual machines using iPerf3 and affinity settings in the ARM template.
35
39
The test ensures that the data throughput exceeds 60 Mbps.
36
40
- Test the transmission of IPv6 TCP data between two virtual machines using iPerf3 and affinity settings in the ARM template.
@@ -64,17 +68,17 @@ The Instance Readiness Test (IRT) framework is an optional/add-on tool for the N
64
68
Stderr should be empty and no packet loss should be observed.
65
69
- Test IPv6 ping between a NAKS cluster pod and a VM with jumbo frames enabled.
66
70
Stderr should be empty and no packet loss should be observed.
67
-
- Validate PVC has been created successfully.
68
-
- Validate PV has been created successfully.
69
-
- Test creating a PVC with volumeMode Block and accessMode RWO.
70
-
- Validate that all the nexus-shared and nexus-volume volumes that were added are mounted in sts 0
71
+
- Validate PersistentVolumeClaim is created successfully.
72
+
- Validate PersistentVolume is created successfully.
73
+
- Test creating a PVC with volumeMode Block and accessMode RWO.
74
+
- Validate that all the nexus-shared and nexus-volume volumes that were added are mounted in sts 0.
71
75
- Validate that all the nexus-shared and nexus-volume volumes that were added are mounted in sts 1.
72
76
- Validate that nfs storage mounted on sts 0 is writable.
73
77
- Validate that nfs storage file written to sts 0 can be read.
74
78
- Validate that shared nfs storage mounted on sts 0 is writable.
75
79
- Validate that shared nfs file written to sts 0 can be read.
76
80
- Validate that nfs storage mounted on sts 1 is writable.
77
-
- Validate that shared file written to sts 0 can be read in sts 1
81
+
- Validate that shared file written to sts 0 can be read in sts 1.
78
82
- Validate that shared nfs storage mounted on sts 1 is writable.
79
83
- Validate that shared file written to sts 0 and sts 1 can be read from sts 1.
80
84
- Validate that shared file written to in sts 0 and sts 1 can be read from sts 0.
@@ -106,19 +110,19 @@ For access to the nexus-samples GitHub repository
106
110
107
111
## Environment Requirements
108
112
109
-
- A Linux environment (Ubuntu suggested) capable of calling Azure APIs
110
-
- Support for other Linux distros e.g. RedHat, Mariner, etc. depends on being able to install the necessary tooling. See [Install Dependencies](#install-dependencies) section.
113
+
- A Linux environment (Ubuntu suggested) capable of calling Azure APIs.
114
+
- Support for other Linux distros for example, RedHat, Mariner, etc. depends on being able to install the necessary tooling. See [Install Dependencies](#install-dependencies) section.
111
115
- Any machine that has the required packages installed should be able to use the scripts.
112
-
- Knowledge of networks to use for the test
116
+
- Knowledge of networks to use for the test.
113
117
* Networks to use for the test are specified in a "networks-blueprint.yml" file, see [Input Configuration](#input-configuration).
114
-
- A way to download the IRT release package e.g. curl, wget, etc
115
-
- The ability to create a service principal with the correct roles
116
-
- The ability to read secrets from the KeyVault, see [Service Principal] (#create-service-principal-and-security-group) section for more details
117
-
- The ability to create security groups in your Active Directory tenant
118
+
- A way to download the IRT release package for example, curl, wget, etc.
119
+
- The ability to create a service principal with the correct roles.
120
+
- The ability to read secrets from the KeyVault, see [Service Principal] (#create-service-principal-and-security-group) section for more details.
121
+
- The ability to create security groups in your Active Directory tenant.
118
122
119
123
## Input Configuration
120
124
121
-
Start by building your input file. The IRT tarball provides `irt-input.example.yml` as an example. Follow the [instructions](#download-irt) to download the tarball. Please note that these values **will not work for your instances**. You need to manually change them and rename the file to `irt-input.yml`. We provide the example input file as a stub to help you configure new input files. The example outlines overridable values and their usage. The **[One Time Setup](#one-time-setup)** assists you in setting input values by writing key/value pairs to the config file as they execute.
125
+
Start by building your input file. The IRT tarball provides `irt-input.example.yml` as an example. Download the tarball by following the [instructions](#download-irt). Note that these values **will not work for your instances**. You need to manually change them and rename the file to `irt-input.yml`. We provide the example input file as a stub to help you configure new input files. The example outlines overridable values and their usage. The **[One Time Setup](#one-time-setup)** assists you in setting input values by writing key/value pairs to the config file as they execute.
122
126
123
127
You can provide the network information in a `networks-blueprint.yml` file, similar to the `networks-blueprint.example.yml` that we provide, or append it to the `irt-input.yml` file. The `networks-blueprint.example.yml` defines the schema for IRT. The test creates the networks, so provide network details that aren't in use. Currently, IRT has the following network requirements:
124
128
@@ -131,11 +135,11 @@ You can provide the network information in a `networks-blueprint.yml` file, simi
131
135
## One Time Setup
132
136
133
137
### Download IRT
134
-
IRT is distributed via tarball from the release section of the [nexus-samples](https://aka.ms/nexus-irt) GitHub repo
138
+
IRT is distributed via tarball from the release section of the [nexus-samples](https://aka.ms/nexus-irt) GitHub repo.
135
139
1. Find the release package marked with 'Latest'. Download it, extract it, and navigate to the `irt` directory.
136
-
1. Extract the tarball to the local file system: `mkdir -p irt && tar xf nexus-irt.tar.gz --directory ./irt`
137
-
1. Switch to the new directory `cd irt`
138
-
1. See RELEASE-CHANGELOG.md for any notable updates or changes
140
+
1. Extract the tarball to the local file system: `mkdir -p irt && tar xf nexus-irt.tar.gz --directory ./irt`.
141
+
1. Switch to the new directory `cd irt`.
142
+
1. See RELEASE-CHANGELOG.md for any notable updates or changes.
139
143
140
144
### Install Dependencies
141
145
There are multiple dependencies expected to be available during execution. Review this list;
@@ -185,7 +189,7 @@ The supplemental script, `create-service-principal.sh` creates a service princip
185
189
186
190
Additionally, the script creates the necessary security group, and adds the service principal to the security group. If the security group exists, it adds the service principal to the existing security group.
187
191
188
-
Executing `create-service-principal.sh` requires the input yaml to have the following properties, all of them can be overridden by the corresponding environment variables:
192
+
Executing `create-service-principal.sh` requires the input yaml to have the following values. All values can be overridden by setting the corresponding environment variables:
189
193
```yml
190
194
SERVICE_PRINCIPAL:
191
195
NAME: "<name>"# env: SERVICE_PRINCIPAL_NAME
@@ -198,7 +202,7 @@ SERVICE_PRINCIPAL:
198
202
* `SERVICE_PRINCIPAL.AAD_GROUP_NAME` - The name of the security group.
199
203
* `SERVICE_PRINCIPAL.SUBSCRIPTION` - The subscription of the service principal.
200
204
* `SERVICE_PRINCIPAL.KV_NAME` - The KeyVault to store the service principal password.
201
-
* `SERVICE_PRINCIPAL.KV_ID` - The KeyVault secret where the service principal password is actually stored.
205
+
* `SERVICE_PRINCIPAL.KV_ID` - The KeyVault secret where the service principal password is stored.
202
206
203
207
> **_NOTE:_** Please ensure that you have already created a KeyVault (KV_NAME) and/or a Secret (KV_ID) with a dummy value prior to executing `create-service-principal.sh`.
204
208
> The `az login` user (person executing IRT) should also be granted access to this KeyVault so secrets can be pulled at runtime.
@@ -227,7 +231,7 @@ KV_ID: "<provided-key-valut-secret>" # If SP already exists please fill it in to
227
231
<details>
228
232
<summary>Expand to see details for using a custom role </summary>
229
233
230
-
If you have an existing service principal and would like the convenience of only having to assign one role for IRT execution, you can follow the steps below.
234
+
If you have an existing service principal and would like the convenience of only having to assign one role for IRT execution, you can follow the directions in this section.
231
235
232
236
##### Prerequisites
233
237
@@ -236,9 +240,9 @@ If you have an existing service principal and would like the convenience of only
236
240
237
241
##### Steps
238
242
239
-
1. Prepare Your Environment
243
+
1. Prepare Your Environment:
240
244
- Open a Bash Shell:
241
-
- You can use any terminal that supports Bash.
245
+
- You can use any terminal that supports Bash
242
246
243
247
1. Sign in to Azure:
244
248
- Execute the following command to sign in to your Azure account:
@@ -270,9 +274,9 @@ If you have an existing service principal and would like the convenience of only
270
274
--parameters roleName="$roleName"
271
275
```
272
276
273
-
1. Assign Role to Application Service Principal used for testing
277
+
1. Assign Role to Application Service Principal used for testing:
274
278
275
-
Weather created via the all-in-one setup or using your own, assign the newly created role to your identity, this single role provides all the necessary authorizations to run Instance Readiness Testing.
279
+
Weather created via the all-in-one setup or using your own, assign the newly created role to your identity. This single role provides all the necessary authorizations to run Instance Readiness Testing.
276
280
277
281
```bash
278
282
# The Application ID of your Service Principal for your application
@@ -297,7 +301,7 @@ If you have an existing service principal and would like the convenience of only
297
301
<details>
298
302
<summary>Expand to see how to create l3 isolation. </summary>
299
303
300
-
The testing framework does not create, destroy, or manipulate isolation domains. Therefore, existing isolation domains can be used for execution. Each isolation domain requires at least one external network. The supplemental script, `create-l3-isolation-domains.sh`. Internal networks e.g. L3, trunked, etc. are created, manipulated, and destroyed through the course of testing.
304
+
The testing framework does't create, destroy, or manipulate isolation domains. Therefore, existing isolation domains can be used for execution. Each isolation domain requires at least one external network. The supplemental script, `create-l3-isolation-domains.sh`. Internal networks for example, L3, trunked, etc. are created, manipulated, and destroyed through the course of testing.
301
305
302
306
Executing `create-l3-isolation-domains.sh` requires one **parameter**, a path to a file containing the networks requirements. You can choose either the standalone network-blueprint.yml or the input.yml based on your workflow, either can contain the information needed.
303
307
@@ -324,11 +328,10 @@ Executing `create-l3-isolation-domains.sh` requires one **parameter**, a path to
324
328
325
329
## How to Read the IRT Summary Results
326
330
327
-
The IRT summary page is a html page that can be downloaded after the
328
-
execution of the IRT and can be viewed from any browser.
331
+
The IRT summary page is an html page that is generated after the
332
+
execution of IRT and can be viewed from any browser.
329
333
330
-
IRT Summary Page comprises three major sections, which drills further to
331
-
provide more details.
334
+
IRT Summary Page comprises three major sections, which expand to provide more details.
332
335
333
336
- Test Results
334
337
@@ -345,47 +348,25 @@ executed, different prerequisite test commands, so totals may not always be the
This is an information only section that provides additional details of
438
-
the Nexus platform. There are no assertions/tests that represent this
422
+
This section is an informational only, that provides additional details about
423
+
the Nexus instance. There are no assertions/tests that represent this
439
424
section. It helps operators to check the status of underlying cluster
440
425
resources and tenant resources running on the cluster after IRT is
441
426
executed.
442
427
443
-
Extras section consists of results displayed by running two different
444
-
text files separately.
428
+
The Extras section consists of results displayed by running two different
429
+
script files separately.
445
430
446
-
- Platform Validation Results -- Displays the Nexus under cloud
447
-
deployed resources details and their current statuses, including
448
-
Cluster Manager details and its extensions, Fabric related details,
449
-
Nexus cluster and its extensions, BareMetal Machines, Arc related
450
-
and Storage appliances.
431
+
- Platform Validation Results:
432
+
- Displays the Nexus under cloud deployed resources details and their current statuses, including Cluster Manager details and its extensions, Fabric related details, Nexus cluster and its extensions, BareMetal Machines, Arc related, and Storage appliances.
451
433
452
-
- Tenant workloads Validation Results -- Displays the Nexus tenant
453
-
resources details and their current statuses running on the Nexus
454
-
cluster, including displaying of L2 and L3 Isolation Domains, Cloud
455
-
Service networks, default cni networks, L2 and L3 networks, trunked
456
-
networks, available list of VMs and Nexus Kubernetes clusters.
434
+
- Tenant workloads Validation Results:
435
+
- Displays the Nexus tenant resources details and their current statuses running on the Nexus cluster, including displaying of L2 and L3 Isolation Domains, Cloud Service networks, default cni networks, L2 and L3 networks, trunked networks, available list of VMs…
457
436
458
437
## Troubleshooting
459
438
460
439
Asserters and debug sections with failures are effective troubleshooting
461
440
methods to address failures and technical problems.
0 commit comments