Skip to content

Commit 7e0b7d4

Browse files
authored
Merge pull request #141 from NetApp/fix_monitoring_ontap_docs
Fixed some typos.
2 parents 0eae33c + 1beb18f commit 7e0b7d4

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

Monitoring/monitor-ontap-services/README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ to obtain the required information to detect certain conditions, and when found,
88
This program was initially created to forward EMS messages to an AWS service outside of the FSxN file system since
99
there was no way to do that from the FSxN file system itself (i.e. the syslog forwarding didn't work at the time). As it turns out this is
1010
no longer the case, in that as of Data ONTAP 9.13.1 you can now forward EMS messages to a 'syslog' server. However, once this program was created,
11-
other funtionality was added to monitor other Data ONTAP services that AWS didn't provide a way to trigger an alert when
12-
something was outside of an expected realm. For example, if the lag time between SnapMirror synchroniation were more
11+
other functionality was added to monitor other Data ONTAP services that AWS didn't provide a way to trigger an alert when
12+
something was outside of an expected realm. For example, if the lag time between SnapMirror synchronization were more
1313
than a specified amount of time. Or, if a SnapMirror update was stalled. This program can alert on all these things and more.
1414
Here is an itemized list of the services that this program can monitor:
1515
- If the file system is available.
@@ -46,16 +46,18 @@ permissions and assigning that role to the Lambda function.
4646
|:-----------------------------|:----------------|
4747
|secretsmanager:GetSecretValue | Needs to be able to retrieve the FSxN administrator credentials. |
4848
|sns:Publish | Since it sends messages (alerts) via SNS, it needs to be able to do so. |
49-
|s3:PutObjecct | The program stores its state information in various s3 objects.|
49+
|s3:PutObject | The program stores its state information in various s3 objects.|
5050
|s3:GetObject | The program reads previous state information, as well as configuration from various s3 objects. |
5151
|s3:ListBucket | To allow the program to know if an object exist or not. |
52+
|ec2:CreateNetworkInterface | Since the program runs as a Lambda function within your VPC, it needs to be able to create a network interface in your VPC. |
53+
|ec2:DescribeNetworkInterfaces | So it can check to see if an network interface already exist. |
5254

5355
### Create an S3 Bucket
5456
One of the goals of the program is to not send multiple messages for the same event. It does this by storing the event
5557
information in an s3 object so it can be compared against before sending a second message for the same event.
5658
Note that it doesn't keep every event indefinitely, it only stores them while the condition is true. So, say for
5759
example it sends an alert for a SnapMirror relationship that has a lag time that is too long. It will
58-
send the alert and store the event. Once a successful SnapMirror synchronization has happen, the event will be removed
60+
send the alert and store the event. Once a successful SnapMirror synchronization has happened, the event will be removed
5961
from the s3 object allowing for a new event to be created and alarmed on.
6062

6163
So, for the program to function, you will need to provide an S3 bucket for it to store event history. It is recommended to
@@ -111,10 +113,11 @@ filename, then set the configFilename environment variable to the name of your c
111113
**NOTE:** Parameter names are case sensitive.
112114

113115
|Parameter Name | Required | Required as an Environment Variable | Default Value | Description |
114-
|:--------------|:---------|:------------------------------------|:--------------|:------------|
115-
|s3BucketName | Yes | Yes | None | Set to the name of the S3 bucket you want the program to store events to. It will also read the matching configuration file from this bucket. |
116-
|s3BucketRegion | Yes | Yes | None | Set to the region the S3 bucket resides in. |
117-
|configFilename | No | Yes | OntapAdminServer + "-config" | Set to the filename (S3 object) that contains parameter assignments. It's okay if it doesn't exist, as long as there are environment variables for all the required parameters. |
116+
|:--------------|:--------:|:-----------------------------------:|:--------------|:------------|
117+
| s3BucketName | Yes | Yes | None | Set to the name of the S3 bucket you want the program to store events to. It will also read the matching configuration file from this bucket. |
118+
| s3BucketRegion | Yes | Yes | None | Set to the region the S3 bucket resides in. |
119+
| OntapAdminServer | Yes | Yes | None | Set to the DNS name,or IP address of the ONTAP server you wish to monitor. |
120+
| configFilename | No | No | OntapAdminServer + "-config" | Set to the filename (S3 object) that contains parameter assignments. It's okay if it doesn't exist, as long as there are environment variables for all the required parameters. |
118121
| emsEventsFilename | No | No | OntapAdminServer + "-emsEvents" | Set to the filename (S3 object) that you want the program to store the EMS events that it alerts on into. This file will be created as necessary. |
119122
| smEventsFilesname | No | No | OntapAdminServer + "-smEvents" | Set to the filename (S3 object) that you want the program to store the SnapMirror alerts into. This file will be created as necessary. |
120123
| smRelationshipsFilename | No | No | OntapAdminServer + "-smRelationships" | Set to the filename (S3 object) that you want the program to store the SnapMirror relationships into. This file will be created as necessary. |
@@ -142,37 +145,37 @@ matching conditions (rules) for. The second key is "rules" which is an array of
142145
matching conditions. Note that each service's rules has its own unique schema. The following is the unique schema
143146
for each of the service's rules.
144147

145-
##### Matching condition schema for System Health
148+
##### Matching condition schema for System Health (systemHealth)
146149
Each rule should be an object with one, or more, of the following keys:
147150

148151
- versionChange - Is a Boolean (true, false) and if 'true' will send an alert when the ONTAP version changes. If it is set to false, it will not report on version changes.
149152
- failover - Is a Boolean (true, false) and if 'true' will send an alert if the FSxN cluster is running on its standby node. If it is set to false, it will not report on failover status.
150153
- networkInterfaces - Is a Boolean (true, false) and if 'true' will send an alert if any of the network interfaces are down. If it is set to false, it will not report on any network interfaces that are down.
151154

152-
##### Matching condition schema for EMS Messages
155+
##### Matching condition schema for EMS Messages (ems)
153156
Each rule should be an object with three keys:
154157

155158
- "name" - Which will match on the EMS event name.
156159
- "message" - Which will match on the EMS event message text.
157160
- "severity" - Which will match on the severity of the EMS event (debug, informational, notice, error, alert or emergency).
158161
Note that all values to each of the keys are used as a regular expressions against the associated EMS component. So, for example, if you want to match on any event message text that starts with “snapmirror” then you would put “^snapmirror”. The “^” character matches the beginning on the string. If you want to match on a specific EMS event name, then you should anchor it with an regular express that starts with “^” for the beginning of the string and ends with “$” for the end of the string. For example, “^arw.volume.state$’. For a complete explanation of the regular expression syntax and special characters, please see the Python documentation found here Regular expression operations.
159162

160-
##### Matching condition schema for SnapMirror relationships
163+
##### Matching condition schema for SnapMirror relationships (snapmirror)
161164
Each rule should be an object with one, or more, of the following keys:
162165

163166
- maxLagTime - Specifies the maximum allowable time, in seconds, since the last successful SnapMirror update before an alert will be sent.
164167
- stalledTransferSeconds - Specifies the minimum number of seconds that have to transpire before a SnapMirror transfer will be considered stalled.
165168
- health - Is a Boolean (true, false) which specifies if you want to alert on a healthy relationship (true) or an unhealthy relationship (false).
166169

167-
##### Matching condition schema for Storage
170+
##### Matching condition schema for Storage (storage)
168171
Each rule should be an object with one, or more, of the following keys:
169172

170173
- aggrWarnPercentUsed - Specifies the maximum allowable physical storage (aggregate) utilization (between 0 and 100) before an alert is sent.
171174
- aggrCriticalPercentUsed - Specifies the maximum allowable physical storage (aggregate) utilization (between 0 and 100) before an alert is sent.
172175
- volumeWarnPercentUsed - Specifies the maximum allowable volume utilization (between 0 and 100) before an alert is sent.
173176
- volumeCriticalPercentUsed - Specifies the maximum allowable volume utilization (between 0 and 100) before an alert is sent.
174177

175-
##### Matching condition schema for Quota
178+
##### Matching condition schema for Quota (quota)
176179
Each rule should be an object with one, or more, of the following keys:
177180

178181
- maxHardQuotaSpacePercentUsed - Specifies the maximum allowable storage utilization (between 0 and 100) against the hard quota limit before an alert is sent.

0 commit comments

Comments
 (0)