You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Monitoring/monitor-ontap-services/README.md
+38-14Lines changed: 38 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,8 @@ that in the [Endpoints for AWS Services](#endpoints-for-aws-services) section be
43
43
44
44
## Prerequisites
45
45
- An FSx for NetApp ONTAP file system you want to monitor.
46
+
- An S3 bucket to store the configuration and event status files, as well as the Lambda layer zip file.
47
+
- You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor_onstap_services/lambda_layer.zip) from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`.
46
48
- The security group associated with the FSx for ONTAP file system must allow inbound traffic from the Lambda function over TCP port 443.
47
49
- An SNS topic to send the alerts to.
48
50
- An AWS Secrets Manager secret that holds the FSx for ONTAP file system credentials. There should be two keys in the secret, one for the username and one for the password.
@@ -66,7 +68,6 @@ The CloudFormation template will do the following:
66
68
The only permission that this role needs is to be able to invoke a Lambda function.
67
69
**NOTE:** You can provide the ARN of an existing role to use instead of having it create a new one.
68
70
- Create the Lambda function with the Python code provided in this repository.
69
-
- Create an S3 bucket for the Lambda function to store the matching conditions file, and the event information, in.
70
71
- Create an EventBridge Schedule to trigger the Lambda function. By default, it will trigger
71
72
it to run every 15 minutes, although there is a parameter that will allow you to set it to whatever interval you want.
72
73
- Optionally create a CloudWatch alarm that will alert you if the Lambda function fails.
@@ -84,8 +85,9 @@ To install the program using the CloudFormation template, you will need to do th
84
85
85
86
|Parameter Name | Notes|
86
87
|---|---|
87
-
|Stackname|The name you want to assign to the CloudFormation stack. Note that this name is used as a base name for some of the resources it creates, so please keep it **under 25 characters**. Also, since it is used as part of the s3 bucket name that it creates it **must be in all lower case letters**.|
88
+
|Stackname|The name you want to assign to the CloudFormation stack. Note that this name is used as a base name for some of the resources it creates, so please keep it **under 25 characters**.|
88
89
|OntapAdminServer|The DNS name, or IP address, of the management endpoint of the FSxN file system you wish to monitor.|
90
+
|S3BucketName|The name of the S3 bucket where you want the program to store event information. It should also have a copy of the `lambda_layer.zip` file. **NOTE** This bucket must be in the same region where this CloudFormation stack is being created.|
89
91
|SubnetIds|The subnet IDs that the Lambda function will be attached to. They must have connectivity to the FSxN file system management endpoint that you wish to monitor.|
90
92
|SecurityGroupIds|The security group IDs that the Lambda function will be attached to. The security group must allow outbound traffic over port 443 to the SNS, Secrets Manager, and CloudWatch and S3 AWS service endpoints, as well as the FSxN file system you want to monitor.|
91
93
|SnsTopicArn|The ARN of the SNS topic you want the program to publish alert messages to.|
@@ -111,9 +113,7 @@ so you probably won't have to change any of them. Note that if you enable EMS al
111
113
send all EMS messages that have a severity of `Error`, `Alert` or `Emergency`. You can change the
112
114
matching conditions at any time by updating the matching conditions file that is created in the S3 bucket.
113
115
The name of the file will be \<OntapAdminServer\>-conditions where "\<OntapAdminServer\>" is the value you
114
-
set for the OntapAdminServer parameter. To find the name of the S3 bucket, or any of the resources that were
115
-
created, you can go to the CloudFormation service in the AWS console, click on the stack you created
116
-
(based on the name you provided as the first parameter above), and then click on the "Resources" tab.
116
+
set for the OntapAdminServer parameter.
117
117
118
118
### Post Installation Checks
119
119
After the stack has been created, check the status of the Lambda function to make sure it is
@@ -158,8 +158,14 @@ Below is the specific list of permissions needed.
158
158
|ec2:DescribeNetworkInterfaces| So it can check to see if a network interface already exists. |
159
159
160
160
#### Create an S3 Bucket
161
-
One of the goals of the program is to not send multiple messages for the same event. It does this by storing the event
162
-
information in an s3 object so it can be compared against before sending a second message for the same event.
161
+
The first use of the s3 bucket will be to store the Lambda layer zip file. This is required to include some dependencies that
162
+
aren't included in the AWS Lambda environment. Currently the only dependency in the zip file is [cronsim](https://pypi.org/project/cronsim/).
163
+
This is used to interpret the SnapMirror schedules to be able to report on lag issues. You can download the zip file from this repository by clicking on
164
+
the [lambda_layer.zip](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor_onstap_services/lambda_layer.zip) link.
165
+
You will refer to this file, and bucket, when you create the Lambda function.
166
+
167
+
Another use of the s3 bucket is to store events that have already reported on so they can be compared against
168
+
to ensure program does not send multiple messages for the same event.
163
169
Note that it doesn't keep every event indefinitely, it only stores them while the condition is true. So, say for
164
170
example it sends an alert for a SnapMirror relationship that has a lag time that is too long. It will
165
171
send the alert and store the event. Once a successful SnapMirror synchronization has happened, the event will be removed
@@ -231,17 +237,34 @@ times out, even after adjusting the timeout to more than several minutes.
231
237
232
238
#### Lambda Function
233
239
There are a few things you need to do to properly configure the Lambda function.
234
-
-Give it the permissions listed above.
240
+
-Assign it the role you created above.
235
241
- Put it in a VPC and subnet that has access to the FSxN file system management endpoint.
242
+
- Assign it the security group that allows outbound traffic over TCP port 443 to the FSxN file system management endpoint.
243
+
244
+
Once you have created the function you will be able to:
245
+
- Copy the Python code from the [monitor_ontap_service.py](monitor_ontap_services.py)
246
+
file found in this repository into the code box and deploy it.
247
+
- Add the Lambda layer to the function. You do this by first creating a Lambda layer then adding it to your function.
248
+
To create a Lambda layer go to the Lambda service page on the AWS console and click on the "Layers"
249
+
tab under the "Additional resources" section. Then, click on the "Create layer" button.
250
+
From there you'll need to provide a name for the layer, and the path to the
| s3BucketName | Yes | Yes | None | Set to the name of the S3 bucket where you want the program to store events to. It will also read the matching configuration file from this bucket. |
260
-
| s3BucketRegion | Yes | Yes | None | Set to the region the S3 bucket resides in. |
283
+
| s3BucketRegion | Yes | Yes | None | Set to the region where the S3 bucket is located. |
261
284
| OntapAdminServer | Yes | Yes | None | Set to the DNS name, or IP address, of the ONTAP server you wish to monitor. |
262
285
| configFilename | No | No | OntapAdminServer + "-config" | Set to the filename (S3 object) that contains parameter assignments. It's okay if it doesn't exist, as long as there are environment variables for all the required parameters. |
263
286
| emsEventsFilename | No | No | OntapAdminServer + "-emsEvents" | Set to the filename (S3 object) where you want the program to store the EMS events that it has alerted on. This file will be created as necessary. |
@@ -317,7 +340,8 @@ Each rule should be an object with one, or more, of the following keys:
317
340
318
341
|Key Name|Value Type|Notes|
319
342
|---|---|---|
320
-
|maxLagTime|Integer|Specifies the maximum allowable time, in seconds, since the last successful SnapMirror update before an alert will be sent.|
343
+
|maxLagTime|Integer|Specifies the maximum allowable time, in seconds, since the last successful SnapMirror update before an alert will be sent. Only used if maxLagTimePercent hasn't been provide, or if the SnapMirror relationship, and the policy it is assigned to, don't have a schedule associated with them. Best practice is to provide both maxLagTime and maxLagTimePercent to ensure all relationships get monitored, in case a schedule gets accidentally removed.|
344
+
|maxLagTimePercent|Integer|Specifies the maximum allowable time, in terms of percent of the amount of time since the last scheduled SnapMirror update, before an alert will be sent. Should be over 100. For example, a value of 200 means 2 times the period since the last scheduled update and if that was supposed to have happen 1 hour ago, it would alert if the relationship hasn't been updated within 2 hours.|
321
345
|stalledTransferSeconds|Integer|Specifies the minimum number of seconds that have to transpire before a SnapMirror transfer will be considered stalled.|
322
346
|healthy|Boolean|If `true` will alert with the relationship is healthy. If `false` will alert with the relationship is unhealthy.|
0 commit comments