You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Monitoring/monitor-ontap-services/README.md
+17-16Lines changed: 17 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ that in the [Endpoints for AWS Services](#endpoints-for-aws-services) section be
47
47
## Prerequisites
48
48
- An FSx for NetApp ONTAP file system you want to monitor.
49
49
- An S3 bucket to store the configuration and event status files, as well as the Lambda layer zip file.
50
-
- You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor_onstap_services/lambda_layer.zip) from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`.
50
+
- You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor-ontap-services/lambda_layer.zip) from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`.
51
51
- The security group associated with the FSx for ONTAP file system must allow inbound traffic from the Lambda function over TCP port 443.
52
52
- An SNS topic to send the alerts to.
53
53
- An AWS Secrets Manager secret that holds the FSx for ONTAP file system credentials. There should be two keys in the secret, one for the username and one for the password.
@@ -74,6 +74,7 @@ The CloudFormation template will do the following:
74
74
- Create an EventBridge Schedule to trigger the Lambda function. By default, it will trigger
75
75
it to run every 15 minutes, although there is a parameter that will allow you to set it to whatever interval you want.
76
76
- Optionally create a CloudWatch alarm that will alert you if the Lambda function fails.
77
+
- Create a Lambda function to send the CloudWatch alarm alert to an SNS topic. This is done so the SNS topic can be in aother region since CloudWatch doesn't support doing that natively.
77
78
- Optionally create a VPC Endpoints for the SNS, Secrets Manager, CloudWatch and/or S3 AWS services.
78
79
79
80
To install the program using the CloudFormation template, you will need to do the following:
@@ -91,32 +92,32 @@ To install the program using the CloudFormation template, you will need to do th
91
92
|Stackname|The name you want to assign to the CloudFormation stack. Note that this name is used as a base name for some of the resources it creates, so please keep it **under 25 characters**.|
92
93
|OntapAdminServer|The DNS name, or IP address, of the management endpoint of the FSxN file system you wish to monitor.|
93
94
|S3BucketName|The name of the S3 bucket where you want the program to store event information. It should also have a copy of the `lambda_layer.zip` file. **NOTE** This bucket must be in the same region where this CloudFormation stack is being created.|
94
-
|SubnetIds|The subnet IDs that the Lambda function will be attached to. They must have connectivity to the FSxN file system management endpoint that you wish to monitor.|
95
+
|SubnetIds|The subnet IDs that the Lambda function will be attached to. They must have connectivity to the FSxN file system management endpoint that you wish to monitor. It is recommended to select at least two.|
95
96
|SecurityGroupIds|The security group IDs that the Lambda function will be attached to. The security group must allow outbound traffic over port 443 to the SNS, Secrets Manager, and CloudWatch and S3 AWS service endpoints, as well as the FSxN file system you want to monitor.|
96
97
|SnsTopicArn|The ARN of the SNS topic you want the program to publish alert messages to.|
97
98
|CloudWatchLogGroupName|The name of **an existing** CloudWatch Log Group that the Lambda function can send event messages to. It will create a new Log Stream within the Log Group every day that is unique to this file system so you can use the same Log Group for multiple instances of this program. If this field is left blank, alerts will not be sent to CloudWatch.|
98
99
|SecretArn|The ARN of the secret within the AWS Secrets Manager that holds the FSxN file system credentials.|
99
-
|SecretUsernameKey|The name of the key within the secret that holds the username portion of the FSxN file system credentials.|
100
-
|SecretPasswordKey|The name of the key within the secret that holds the password portion of the FSxN file system credentials.|
100
+
|SecretUsernameKey|The name of the key within the secret that holds the username portion of the FSxN file system credentials. The default is 'username'.|
101
+
|SecretPasswordKey|The name of the key within the secret that holds the password portion of the FSxN file system credentials. The default is 'password'.|
101
102
|CheckInterval|The interval, in minutes, that the EventBridge schedule will trigger the Lambda function. The default is 15 minutes.|
102
-
|CreateCloudWatchAlarm|Set to "true" if you want to create a CloudWatch alarm that will alert you if the Lambda function fails.|
103
-
|CreateSecretsManagerEndpoint|Set to "true" if you want to create a Secrets Manager endpoint. **NOTE:** If an SecretsManager Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
104
-
|CreateLambdaEndpoint|Set to "true" if you want to create an Lambda endpoint. **NOTE:** If an Lambda Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
105
-
|CreateCWEndpoint|Set to "true" if you want to create a CloudWatch endpoint. **NOTE:** If an CloudWatch Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
106
-
|CreateS3Endpoint|Set to "true" if you want to create an S3 endpoint. **NOTE:** If an S3 Gateway Endpoint already exist for the specified VPC the endpoint creation will fail, causing the entire CloudFormation stack to fail. Note that this will be a "Gateway" type endpoint, since they are free to use. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
103
+
|CreateCloudWatchAlarm|Set to "true" if you want to create a CloudWatch alarm, and accompanying Lambda function, that will alert you if the monitoring Lambda function fails.|
104
+
|CreateSecretsManagerEndpoint|Set to "true" if you want to create a Secrets Manager endpoint. **NOTE:** If a SecretsManager Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
105
+
|CreateCWEndpoint|Set to "true" if you want to create a CloudWatch endpoint. **NOTE:** If a CloudWatch Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
106
+
|CreateS3Endpoint|Set to "true" if you want to create an S3 endpoint. **NOTE:** If a S3 Gateway Endpoint already exist for the specified VPC the endpoint creation will fail, causing the entire CloudFormation stack to fail. Note that this will be a "Gateway" type endpoint, since they are free to use. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
107
107
|RoutetableIds|The route table IDs to update to use the S3 endpoint. Since the S3 endpoint is of type `Gateway` route tables have to be updated to use it. This parameter is only needed if you are creating an S3 endpoint.|
108
108
|VpcId|The ID of a VPC where the subnets provided above are located. Required if you are creating an endpoint, not needed otherwise.|
109
109
|EndpointSecurityGroupIds|The security group IDs that the endpoint will be attached to. The security group must allow traffic over TCP port 443 from the Lambda function. This is required if you are creating an Lambda, CloudWatch or SecretsManager endpoint.|
110
110
|watchdogRoleArn|The ARN of the role that the Lambda function that the Watchdog CloudWatch alert will use to send SNS alerts if something goes wrong with the monitoring Lambda function. The only required permission is to publish to the SNS topic listed above, although highly recommended that you also add the AWS managed "AWSLambdaBasicExecutionRole" policy that allows the Lambda function to create and write to a CloudWatch log stream so it can provide diagnostic output of something goes wrong. Only required if creating a CloudWatch alert and you want to provide your own role. If left blank a role will be created for you if needed.|
111
111
|LambdaRoleArn|The ARN of the role that the Lambda function will use. This role must have the permissions listed in the [Create an AWS Role](#create-an-aws-role) section below. If left blank a role will be created for you.|
112
112
|SchedulerRoleArn|The ARN of the role that the EventBridge schedule will use to trigger the Lambda function. It just needs the permission to invoke a Lambda function. If left blank a role will be created for you.|
113
+
|CloudWatchRoleArn|The ARN of the role that the will be assigned to the Lambda function that will publish messages to an SNS topic. It just needs the permission to publish to the SNS topic. If left blank, and you CreateCloudWatchAlarm is set to "true", a role will be created for you.|
113
114
114
115
The remaining parameters are used to create the matching conditions configuration file, which specify when the program will send an alert.
115
116
You can read more about it in the [Matching Conditions File](#matching-conditions-file) section below. All these parameters have reasonable default values
116
117
so you probably won't have to change any of them. Note that if you enable EMS alerts, then the default rule will
117
-
send all EMS messages that have a severity of `Error`, `Alert` or `Emergency`. You can change the
118
+
alert on all EMS messages that have a severity of `Error`, `Alert` or `Emergency`. You can change the
118
119
matching conditions at any time by updating the matching conditions file that is created in the S3 bucket.
119
-
The name of the file will be \<OntapAdminServer\>-conditions where "\<OntapAdminServer\>" is the value you
120
+
The name of the file will be `<OntapAdminServer>-conditions` where `<OntapAdminServer>` is the value you
120
121
set for the OntapAdminServer parameter.
121
122
122
123
### Post Installation Checks
@@ -125,9 +126,9 @@ not in an error state. To find the Lambda function go to the Resources tab of th
125
126
stack and click on the "Physical ID" of the Lambda function. This should bring you to the Lambda service in the AWS
126
127
console. Once there, click on the "Monitor" tab to see if the function has been invoked. Locate the
127
128
"Error count and success rate(%)" chart, which is usually found at the top right corner of the "Monitor" dashboard.
128
-
Within the "CheckInterval" number of minutes there should be at least one dot on that chart. Note that sometimes
129
-
the chart is initially slow to reflect any status so you might have to be patient. Continue to press the "refresh"
130
-
button (the icon with a circle on it) to update the status. Once you see a dot on the chart, when you hover your mouse
129
+
Within the "CheckInterval" number of minutes there should be at least one dot on that chart. Note that initially
130
+
the chart is slow to reflect any status so you might have to be patient. Continue to press the "refresh"
131
+
button (the icon with a circle on it) every minute or so to update the status. Once you see a dot on the chart, when you hover your mouse
131
132
over it, you should see the "success rate" and "number of errors." The success rate should be 100% and the number
132
133
of errors should be 0. If it is not, then scroll down to the CloudWatch Logs section and click on the most recent
133
134
log stream. This will show you the output of the Lambda function. If there are any errors, they will be displayed
@@ -165,7 +166,7 @@ Below is the specific list of permissions needed.
165
166
The first use of the s3 bucket will be to store the Lambda layer zip file. This is required to include some dependencies that
166
167
aren't included in the AWS Lambda environment. Currently the only dependency in the zip file is [cronsim](https://pypi.org/project/cronsim/).
167
168
This is used to interpret the SnapMirror schedules to be able to report on lag issues. You can download the zip file from this repository by clicking on
168
-
the [lambda_layer.zip](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor_onstap_services/lambda_layer.zip) link.
169
+
the [lambda_layer.zip](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor-ontap-services/lambda_layer.zip) link.
169
170
You will refer to this file, and bucket, when you create the Lambda function.
170
171
171
172
Another use of the s3 bucket is to store events that have already reported on so they can be compared against
@@ -252,7 +253,7 @@ Once you have created the function you will be able to:
252
253
To create a Lambda layer go to the Lambda service page on the AWS console and click on the "Layers"
253
254
tab under the "Additional resources" section. Then, click on the "Create layer" button.
254
255
From there you'll need to provide a name for the layer, and the path to the
0 commit comments