You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Monitoring/monitor-ontap-services/README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,22 +95,22 @@ To install the program using the CloudFormation template, you will need to do th
95
95
|SubnetIds|The subnet IDs that the Lambda function will be attached to. They must have connectivity to the FSxN file system management endpoint that you wish to monitor. It is recommended to select at least two.|
96
96
|SecurityGroupIds|The security group IDs that the Lambda function will be attached to. The security group must allow outbound traffic over port 443 to the SNS, Secrets Manager, and CloudWatch and S3 AWS service endpoints, as well as the FSxN file system you want to monitor.|
97
97
|SnsTopicArn|The ARN of the SNS topic you want the program to publish alert messages to.|
98
-
|CloudWatchLogGroupName|The name of **an existing** CloudWatch Log Group that the Lambda function can send event messages to. It will create a new Log Stream within the Log Group every day that is unique to this file system so you can use the same Log Group for multiple instances of this program. If this field is left blank, alerts will not be sent to CloudWatch.|
98
+
|CloudWatchLogGroupARN|The ARN of **an existing** CloudWatch Log Group that the Lambda function can send event messages to. It will create a new Log Stream within the Log Group every day that is unique to this file system so you can use the same Log Group for multiple instances of this program. If this field is left blank, alerts will not be sent to CloudWatch.|
99
99
|SecretArn|The ARN of the secret within the AWS Secrets Manager that holds the FSxN file system credentials.|
100
100
|SecretUsernameKey|The name of the key within the secret that holds the username portion of the FSxN file system credentials. The default is 'username'.|
101
101
|SecretPasswordKey|The name of the key within the secret that holds the password portion of the FSxN file system credentials. The default is 'password'.|
102
102
|CheckInterval|The interval, in minutes, that the EventBridge schedule will trigger the Lambda function. The default is 15 minutes.|
103
-
|CreateCloudWatchAlarm|Set to "true" if you want to create a CloudWatch alarm, and accompanying Lambda function, that will alert you if the monitoring Lambda function fails.|
103
+
|CreateCloudWatchAlarm|Set to "true" if you want to create a CloudWatch alarm that will alert you if the monitoring Lambda function fails.|
104
+
|ImplementWatchdogAsLambda|If set to "true" a Lambda function will be created that will allow the CloudWatch alarm to publish an alert to an SNS topic in another region. Only necessary if the SNS topic is in another region since CloudWatch cannot send alerts across regions.|
105
+
|WatchdogRoleArn|The ARN of the role assigned to the Lambda function that the watchdog CloudWatch alarm will use to publish SNS alerts with. The only required permission is to publish to the SNS topic listed above, although highly recommended that you also add the AWS managed "AWSLambdaBasicExecutionRole" policy that allows the Lambda function to create and write to a CloudWatch log stream so it can provide diagnostic output of something goes wrong. Only required if creating a CloudWatch alert, implemented as a Lambda function, and you want to provide your own role. If left blank a role will be created for you if needed.|
106
+
|LambdaRoleArn|The ARN of the role that the Lambda function will use. This role must have the permissions listed in the [Create an AWS Role](#create-an-aws-role) section below. If left blank a role will be created for you.|
104
107
|CreateSecretsManagerEndpoint|Set to "true" if you want to create a Secrets Manager endpoint. **NOTE:** If a SecretsManager Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
105
108
|CreateSNSEndpoint|Set to "true" if you want to create an SNS endpoint. **NOTE:** If a SNS Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
106
109
|CreateCWEndpoint|Set to "true" if you want to create a CloudWatch endpoint. **NOTE:** If a CloudWatch Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
107
110
|CreateS3Endpoint|Set to "true" if you want to create an S3 endpoint. **NOTE:** If a S3 Gateway Endpoint already exist for the specified VPC the endpoint creation will fail, causing the entire CloudFormation stack to fail. Note that this will be a "Gateway" type endpoint, since they are free to use. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
108
111
|RoutetableIds|The route table IDs to update to use the S3 endpoint. Since the S3 endpoint is of type `Gateway` route tables have to be updated to use it. This parameter is only needed if you are creating an S3 endpoint.|
109
112
|VpcId|The ID of a VPC where the subnets provided above are located. Required if you are creating an endpoint, not needed otherwise.|
110
113
|EndpointSecurityGroupIds|The security group IDs that the endpoint will be attached to. The security group must allow traffic over TCP port 443 from the Lambda function. This is required if you are creating an Lambda, CloudWatch or SecretsManager endpoint.|
111
-
|LambdaRoleArn|The ARN of the role that the Lambda function will use. This role must have the permissions listed in the [Create an AWS Role](#create-an-aws-role) section below. If left blank a role will be created for you.|
112
-
|SchedulerRoleArn|The ARN of the role that the EventBridge schedule will use to trigger the Lambda function. It just needs the permission to invoke a Lambda function. If left blank a role will be created for you.|
113
-
|watchdogRoleArn|The ARN of the role that the Lambda function that the Watchdog CloudWatch alert will use to send SNS alerts if something goes wrong with the monitoring Lambda function. The only required permission is to publish to the SNS topic listed above, although highly recommended that you also add the AWS managed "AWSLambdaBasicExecutionRole" policy that allows the Lambda function to create and write to a CloudWatch log stream so it can provide diagnostic output of something goes wrong. Only required if creating a CloudWatch alert and you want to provide your own role. If left blank a role will be created for you if needed.|
114
114
115
115
The remaining parameters are used to create the matching conditions configuration file, which specify when the program will send an alert.
116
116
You can read more about it in the [Matching Conditions File](#matching-conditions-file) section below. All these parameters have reasonable default values
@@ -124,16 +124,16 @@ set for the OntapAdminServer parameter.
124
124
After the stack has been created, check the status of the Lambda function to make sure it is
125
125
not in an error state. To find the Lambda function go to the Resources tab of the CloudFormation
126
126
stack and click on the "Physical ID" of the Lambda function. This should bring you to the Lambda service in the AWS
127
-
console. Once there, click on the "Monitor" tab to see if the function has been invoked. Locate the
127
+
console. Once there, click on the "Monitor" tab to see if the function has been invoked. Note that it will take
128
+
at least the configured iteration time before the function is invoked for the first time. Locate the
128
129
"Error count and success rate(%)" chart, which is usually found at the top right corner of the "Monitor" dashboard.
129
-
Within the "CheckInterval" number of minutes there should be at least one dot on that chart. Note that initially
130
-
the chart is slow to reflect any status so you might have to be patient. Continue to press the "refresh"
131
-
button (the icon with a circle on it) every minute or so to update the status. Once you see a dot on the chart, when you hover your mouse
132
-
over it, you should see the "success rate" and "number of errors." The success rate should be 100% and the number
133
-
of errors should be 0. If it is not, then scroll down to the CloudWatch Logs section and click on the most recent
134
-
log stream. This will show you the output of the Lambda function. If there are any errors, they will be displayed
135
-
there. If you can't figure out what is causing an error, then please create an issue in this repository and someone
136
-
will help you.
130
+
After the "CheckInterval" number of minutes there should be at least one dot on that chart.
131
+
Hover your mouse over the dot and you should see the "success rate" and "number of errors."
132
+
The success rate should be 100% and the number of errors should be 0. If it is not, then scroll up a little bit and
133
+
click on "View CloudWatch Logs" link. Once on this page, click on the first LogStream and review any output.
134
+
If there are any errors, they will be displayed there. If you can't figure out what is causing an error,
135
+
please create an issue on the [Issues](https://github.com/NetApp/FSx-ONTAP-samples-scripts/issues) section
136
+
in this repository and someone will help you.
137
137
138
138
---
139
139
@@ -324,7 +324,7 @@ Each rule should be an object with one, or more, of the following keys:
324
324
|failover|Boolean|If 'true' the program will send an alert if the FSxN cluster is running on its standby node. If it is set to `false`, it will not report on failover status.|
325
325
|networkInterfaces|Boolean|If 'true' the program will send an alert if any of the network interfaces are down. If it is set to `false`, it will not report on any network interfaces that are down.|
326
326
327
-
###### Matching condition schema for EMS Messages (ems)
327
+
###### Matching condition schema for EMS Events (ems)
328
328
Each rule should be an object with three keys, with an optional 4th key:
0 commit comments