Skip to content

Commit e2d247d

Browse files
committed
Cloudwatch monitoring V2 changes
alarms configuration, lun metrics, other username support, capacity pool metrics, report back
1 parent be61c2c commit e2d247d

File tree

2 files changed

+381
-94
lines changed

2 files changed

+381
-94
lines changed

Monitoring/CloudWatch-FSx/README.md

Lines changed: 44 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The template creates the following resources:
2020
4. Lambda Role - The IAM role that allows the Lambda function to run.
2121
5. Scheduler Role - The IAM role that allows the scheduler to trigger the Lambda function.
2222
6. SecretManager endpoint - The Lambda function runs inside a VPC, which by default lacks outgoing internet connectivity. To
23-
enable the function to securely access the fsxadmin passwords stored in AWS Secrets Manager, a VPC endpoint for the Secrets
23+
enable the function to securely access the fsx credentials stored in AWS Secrets Manager, a VPC endpoint for the Secrets
2424
Manager service is required. This endpoint allows the Lambda function to retrieve sensitive information from Secrets Manager
2525
without needing direct internet access, maintaining security while ensuring the function can access the necessary credentials.
2626
7. CloudWatch endpoint - The Lambda function runs inside a VPC, which by default lacks outgoing internet connectivity. To enable
@@ -64,14 +64,20 @@ function to send calls to FSxService to retrieve file systems information.
6464
* "scheduler:CreateSchedule"
6565
* "scheduler:DeleteSchedule"
6666
* "logs:PutRetentionPolicy"
67-
* "secretsmanager:GetSecretValue" (on specific secert)
68-
2. Optional: create a secret in AWS Secrets Manager with key-value pairs of file system IDs and their corresponding fsxadmin
69-
passwords. This secret is necessary for making direct ONTAP API calls to monitor resources, such as SnapMirror relations.
70-
Example secret structure:
67+
* "secretsmanager:GetSecretValue" (on specific secret)
68+
2. Optional: create a secret in AWS Secrets Manager with key-value pairs of file system IDs and their corresponding credentials value.
69+
Value can be provided in two formats. The first format is simply the password for the 'fsxadmin' user. The second format includes both the username and password, separated by a colon.
70+
This secret is necessary for making direct ONTAP API calls to monitor resources, such as SnapMirror relations.
71+
Examples secret structure:
7172
```
7273
{
7374
"fs-111222333": "Password1",
7475
"fs-444555666": "Password2"
76+
}
77+
or
78+
{
79+
"fs-111222333": "myUserName:Password1",
80+
"fs-444555666": "Password2"
7581
}
7682
```
7783
When deploying the CloudFormation template, you will need to provide the ARN of this secret as a parameter. This allows the Lambda function to securely access the passwords for monitoring purposes.
@@ -90,21 +96,42 @@ systems.
9096
4. Security Group IDs - The IDs of the Security Groups that will be associated with the Lambda function when it runs. These Security
9197
Groups must allow connectivity to the file systems.
9298
5. Create FSx Service Endpoint - A boolean flag indicating whether you plan to create a FSxService VPC endpoint inside the VPC. Set
93-
this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint. If you already have one, set this to false; otherwise, set it to true.
99+
this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true.
94100
6. Create Secret Manager Endpoint - A boolean flag indicating whether you plan to create a SecretManager VPC endpoint inside the
95-
VPC. Set this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint. If you already have one, set this to false; otherwise, set it to true.
101+
VPC. Set this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true.
96102
7. Create CloudWatch Endpoint - A boolean flag indicating whether you plan to create a CloudWatch VPC endpoint inside the VPC. Set
97-
this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint. If you already have one, set this to false; otherwise, set it to true.
98-
8. Secret Manager FSx Admin Passwords ARN - Optional - The ARN of the AWS Secrets Manager secret containing the fsxadmin passwords.
103+
this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true.
104+
8. Secret Manager FSx Admin Passwords ARN - Optional - The ARN of the AWS Secrets Manager secret containing the fsx credentials.
99105
This ARN is required for certain functionalities, such as snapmirror metrics collection.
100-
If not provided, some features may not operate correctly. This secret should contain key-value pairs.
101-
The key is the File System ID, and the value is the fsxadmin password. For example:
102-
```
103-
{
104-
"fs-111222333":"Password1",
105-
"fs-444555666":"Password2"
106-
}
107-
```
106+
If not provided, some features may not operate correctly. This secret should contain key-value pairs as described in Prerequisites section above.
107+
9. SNS Topic ARN for CloudWatch alarms - Optional - The ARN of the SNS topic to which CloudWatch alarms will be sent. If not provided, alarms will not be notified to any SNS topic.
108+
109+
## Alarms Configuration
110+
The Lambda function is responsible for creating alarms based on the thresholds set via environment variables. These environment variables can be set from the AWS console, under the Configuration tab of the dashboard Lambda function. You can find the specific Lambda function by its name “FSxNDashboard-<CloudFormation-Stack-Name>.
111+
The following environment variables are used:
112+
1. CLIENT_THROUGHPUT_ALARM_THRESHOLD: This sets the threshold for the client throughput alarm. The default value is "90", but this can be customized as needed. When the client throughput exceeds this value (expressed as a percentage), an alarm will be triggered.
113+
1. DISK_PERFORMANCE_ALARM_THRESHOLD: This sets the threshold for the disk performance alarm. The default value is "90", but this can be customized as needed. When the disk performance exceeds this value (expressed as a percentage), an alarm will be triggered.
114+
1. DISK_THROUGHPUT_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the disk throughput utilization alarm. The default value is "90", but this can be customized as needed. When disk throughput utilization exceeds this value (expressed as a percentage), an alarm will be triggered.
115+
1. SNAPMIRROR_UNHEALTHY_ALARM_THRESHOLD: This sets the threshold for the SnapMirror unhealthy alarm. The default value is "0", but this can be customized as needed. When the number of unhealthy SnapMirror relationships exceeds this value, an alarm will be triggered.
116+
1. STORAGE_CAPACITY_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the storage capacity utilization alarm. The default value is "80", but this can be customized as needed. When storage capacity utilization exceeds this value (expressed as a percentage), an alarm will be triggered.
117+
1. VOLUME_STORAGE_CAPACITY_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the volume storage capacity utilization alarm. The default value is "80", but this can be customized as needed. When volume storage capacity utilization exceeds this value (expressed as a percentage), an alarm will be triggered.
118+
119+
In addition to the environment variables, you can use tags on the FSx and volume resources to override default thresholds or skip alarm management for specific resources. If a threshold is set to 100, the alarm will not be created. Similarly, skip tag is set to true, the alarm will be skipped.
120+
121+
The tag keys used for this purpose are:
122+
123+
1. client-throughput-alarm-threshold
124+
1. skip-client-throughput-alarm
125+
1. disk-performance-alarm-threshold
126+
1. skip-disk-performance-alarm
127+
1. disk-throughput-utilization-threshold
128+
1. skip-disk-throughput-utilization-alarm
129+
1. storage-capacity-utilization-alarm-threshold
130+
1. skip-storage-capacity-utilization-alarm
131+
1. volume-storage-capacity-utilization-alarm-threshold
132+
1. skip-volume-storage-capacity-utilization-alarm
133+
1. snapMirror-unhealthy-relations-alarm-threshold
134+
1. skip-snapmirror-unhealthy-relations-alarm
108135

109136
## Important Disclaimer: CloudWatch Alarms Deletion
110137
Please note that when you delete the CloudFormation stack associated with this project, the CloudWatch Alarms created by the stack will not be automatically deleted.

0 commit comments

Comments
 (0)