Skip to content

Commit d55a06d

Browse files
committed
Made changes after testing. Had to force all AWS services (SNS, SecretManager, S3, Lambda) to reside in the same region as the FSxN file system.
1 parent 5cd2b99 commit d55a06d

File tree

3 files changed

+71
-59
lines changed

3 files changed

+71
-59
lines changed

Monitoring/monitor-ontap-services/README.md

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -54,20 +54,18 @@ To install the program using the CloudFormation template, you will need to do th
5454
|Parameter Name | Notes|
5555
|---|---|
5656
|Stackname|The name you want to assign to the CloudFormation stack. Note that this name is used as a base name for the resources it creates, so please keep it under 25 characters.|
57-
|BucketRegion|The region where you want the S3 bucket, that is used to store event information and the matching conditions file, to reside.|
5857
|OntapAdminServer|The DNS name, or IP address, of the management endpoint of the FSxN file system you wish to monitor.|
59-
|VpcId|The VPC ID that the Lambda function will run in. Note that since the Lambda function has to communicate with the FSxN file server, it has to run in a VPC that can communicate with FSxN file server you want to monitor.|
60-
|SubnetIds|The subnet IDs that the Lambda function will be attached to. These must be in the VPC specified above.|
61-
|SecurityGroupIds|The security group IDs that the Lambda function will be attached to. These must be in the VPC specified above.|
58+
|SubnetIds|The subnet IDs that the Lambda function will be attached to. Must have connectivity to the FSxN file system you wish to monitor.|
59+
|SecurityGroupIds|The security group IDs that the Lambda function will be attached to.|
6260
|SnsTopicArn|The ARN of the SNS topic you want the program to publish alert messages to.|
63-
|SnsRegion|The region where the SNS topic resides.|
64-
|SecretArn|The ARN of the secret within the AWS Secrets Manager that holds the FSxN file system credentials.|
65-
|SecretRegion|The region where the secret is stored.|
61+
|SecretArn|The ARN of the secret within the AWS Secrets Manager that holds the FSxN file system credentials. **NOTE:** The secret must be in the same region as the FSxN file system.|
6662
|SecretUsernameKey|The key name within the secret that holds the username portion of the FSxN file system credentials.|
6763
|SecretPasswordKey|The key name within the secret that holds the password portion of the FSxN file system credentials.|
6864
|CreateSNSEndpoint|Set to "true" if you want to create an SNS endpoint. Since the Lambda function will be running within your VPC it will most likely not have access to the Internet, therefore a endpoint will need to be created if you don't already have one. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
6965
|CreateSecretsManagerEndpoint|Set to "true" if you want create a Secrets Manager endpoint. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
7066
|CreateS3Endpoint|Set to "true" if you want create an S3 endpoint. Note that this will be a "Gateway" type endpoint, since they are free to use. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.|
67+
|RoutetableIds|The route table IDs to update to use the S3 endpoint. Since the S3 endpoint is of type 'Gateway' route tables have to be updated to use it. This parameter is only needed if createS3Endpoint is set to 'true'.|
68+
|VpcId|The VPC ID where the FSxN file system is located. This is only needed if you are creating an endpoint.|
7169
|CheckInterval|The interval, in minutes, that the EventBridge schedule will trigger the Lambda function. The default is 15 minutes.|
7270

7371
The remaining parameters are used to create the matching conditions file, which specify when the program will send an SNS alert.
@@ -76,24 +74,24 @@ so you don't have to set them if you don't want to. Note that if you enable EMS
7674
send all EMS messages that have a severity of `Error`, `Alert` or `Emergency`. You can change the
7775
matching conditions at any time by updating the matching conditions file that is created in the S3 bucket.
7876
The name of the file will be \<OntapAdminServer\>-conditions where "\<OntapAdminServer\>" is the value you
79-
set for the OntapAdminServer parameter.
80-
81-
To find the name of the bucket, or any of the resources that were created, you can go to the CloudFormation service
82-
in the AWS console, click on the stack you created (based on the name you provided as the first parameter above),
83-
and then click on the "Resources" tab.
77+
set for the OntapAdminServer parameter. To find the name of the S3 bucket, or any of the resources that were
78+
created, you can go to the CloudFormation service in the AWS console, click on the stack you created
79+
(based on the name you provided as the first parameter above), and then click on the "Resources" tab.
8480

81+
### Post Installation Checks
8582
After the stack has been created, I would recommend checking the status of the Lambda function to make sure it is
86-
not in an error state. To find the Lambda function, as mentioned above, go to the Resources tab of the CloudFormation
83+
not in an error state. To find the Lambda function go to the Resources tab of the CloudFormation
8784
stack and click on the "Physical ID" of the Lambda function. This should bring you to the Lambda service in the AWS
8885
console. Once there, you can click on the "Monitoring" tab to see if the function has been invoked. Locate the
89-
"Error count and success rate(%)" chart, which is usually found at the top right corner of the dashboard. Within the "CheckInterval" number
90-
of minutes there should be at least one dot on that chart. Note that sometimes the chart is initially slow to reflect any
91-
status so you might have to be patient, and continue to press the "refresh" button (the icon with
92-
a circle on it) to see an status. Once you see a dot on the chart, when you hover you mouse over it, you should see the "success
93-
rate" and "number of errors." The success rate should be 100% and the number of errors should be 0. If it is not,
94-
then scroll down to the CloudWatch Logs section and click on the most recent log stream. This will show you the
95-
output of the Lambda function. If there are any errors, they will be displayed there. If you can't figure out
96-
what the error is, then please create an issue in this repository and someone will help you.
86+
"Error count and success rate(%)" chart, which is usually found at the top right corner of the monitoring dashboard.
87+
Within the "CheckInterval" number of minutes there should be at least one dot on that chart. Note that sometimes
88+
the chart is initially slow to reflect any status so you might have to be patient, and continue to press the "refresh"
89+
button (the icon with a circle on it) to see an status. Once you see a dot on the chart, when you hover you mouse
90+
over it, you should see the "success rate" and "number of errors." The success rate should be 100% and the number
91+
of errors should be 0. If it is not, then scroll down to the CloudWatch Logs section and click on the most recent
92+
log stream. This will show you the output of the Lambda function. If there are any errors, they will be displayed
93+
there. If you can't figure out what the error is, then please create an issue in this repository and someone will
94+
help you.
9795

9896
### Manual Installation
9997
If you want more control over the installation then you can install it manually by following the steps below. Note that these
@@ -133,10 +131,14 @@ overwrite the event files of another instance.
133131

134132
This bucket is also used to store the Matching Condition file. You can read more about it in the [Matching Conditions File](#matching-conditions-file) below.
135133

134+
**Note:** This bucket must be in the same region as the FSxN file system.
135+
136136
#### Create an SNS Topic
137137
Since the way this program sends alerts is via an SNS topic, you need to either create SNS topic, or use an
138138
existing one.
139139

140+
**Note:** This SNS topic must be in the same region as the FSxN file system.
141+
140142
#### Endpoints for AWS Services
141143
If you deploy this as a Lambda function, you will have to attach it to the VPC that your FSx file system resides
142144
in so it can run ONTAP APIs against it. When you do that, it is likely that Lambda function will not have access the
@@ -163,8 +165,8 @@ them to the "local" DNS name of the respective endpoints.
163165
#### Lambda Function
164166
There are a few things you need to do to properly configure the Lambda function.
165167
- Give it the permissions listed above.
166-
- Put it in a VPC and subnet that has access to the FSxN file system management endpoint.
167-
- Increase the total run time to at least 10 seconds. You might have to raise that if you have a lot of components in your FSxN file system. However, if you have to raise it to more than a minute, it could be an issue with the endpoint causing the calls to the AWS services to hang. See the [Endpoints for AWS Services](#endpoints-for-aws-services) section above for more information.
168+
- Put it in a VPC and subnet that has access to the FSxN file system management endpoint. **NOTE:** It must be in the same region as the FSxN file system.
169+
- Increase the total run time to at least 20 seconds. You might have to raise that if you have a lot of components in your FSxN file system. However, if you have to raise it to more than a minute, it could be an issue with the endpoint causing the calls to the AWS services to hang. See the [Endpoints for AWS Services](#endpoints-for-aws-services) section above for more information.
168170
- Provide for the base configuration via environment variables and/or a configuration file. See the [Configuration Parameters](#configuration-parameters) section below for more information.
169171
- Create the "Matching Conditions" file, that specifies when the Lambda function should send alerts. See the [Matching Conditions File](#matching-conditions-file) section below for more information.
170172
- Set up an EventBridge Schedule rule to trigger the function on a regular basis.

Monitoring/monitor-ontap-services/cloudformation.yaml

Lines changed: 22 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,18 @@ Metadata:
77
- Label:
88
default: "Configuration Parameters"
99
Parameters:
10-
- s3BucketRegion
1110
- OntapAdminSever
12-
- vpcId
1311
- subNetIds
1412
- securityGroupIds
1513
- snsTopicArn
16-
- snsTopicRegion
1714
- secretArn
18-
- secretRegion
1915
- secretUsernameKey
2016
- secretPasswordKey
2117
- createSecretManagerEndpoint
2218
- createSNSEndpoint
2319
- createS3Endpoint
20+
- routeTableIds
21+
- vpcId
2422
- checkInterval
2523
- Label:
2624
default: "Alert Parameters"
@@ -40,21 +38,11 @@ Metadata:
4038
- inodeQuotaUtilizationAlert
4139

4240
Parameters:
43-
s3BucketRegion:
44-
Description: "The region where you want the S3 bucket to be created."
45-
Type: String
46-
Default: ""
47-
4841
OntapAdminSever:
4942
Description: "The DNS name, or IP address, of the management endpoint of the FSxN file system to be monitored."
5043
Type: String
5144
Default: ""
5245

53-
vpcId:
54-
Description: "The VPC ID where the FSxN file system is located."
55-
Type: "AWS::EC2::VPC::Id"
56-
Default: ""
57-
5846
subNetIds:
5947
Description: "The subnet IDs where the FSxN file system is located."
6048
Type: "List<AWS::EC2::Subnet::Id>"
@@ -70,21 +58,11 @@ Parameters:
7058
Type: String
7159
Default: ""
7260

73-
snsTopicRegion:
74-
Description: "The region where SNS topic resides."
75-
Type: String
76-
Default: ""
77-
7861
secretArn:
7962
Description: "The ARN of the secret that holds the FSxN credentials to use."
8063
Type: String
8164
Default: ""
8265

83-
secretRegion:
84-
Description: "The region where the secret resides."
85-
Type: String
86-
Default: ""
87-
8866
secretUsernameKey:
8967
Description: "The key in the secret that holds the username."
9068
Type: String
@@ -113,6 +91,16 @@ Parameters:
11391
Default: "false"
11492
AllowedValues: ["true", "false"]
11593

94+
routeTableIds:
95+
Description: "The route table IDs to update to use the S3 endpoint. Since the S3 endpoint is of type 'Gateway' route tables have to be updated to use it. This parameter is only needed if createS3Endpoint is set to 'true'."
96+
Type: CommaDelimitedList
97+
Default: ""
98+
99+
vpcId:
100+
Description: "The VPC ID where the FSxN file system is located. This is only needed if you are creating an endpoint."
101+
Type: "AWS::EC2::VPC::Id"
102+
Default: ""
103+
116104
checkInterval:
117105
Description: "The interval, in minutes, between checks."
118106
Type: Number
@@ -200,8 +188,8 @@ Resources:
200188
Condition: CreateSecretManagerEndpoint
201189
Properties:
202190
VpcId: !Ref vpcId
203-
ServiceName: !Sub "com.amazonaws.${secretRegion}.secretsmanager"
204-
VpcEndpointType: Interface
191+
ServiceName: !Sub "com.amazonaws.${AWS::Region}.secretsmanager"
192+
VpcEndpointType: 'Interface'
205193
PrivateDnsEnabled: true
206194
SubnetIds: !Ref subNetIds
207195

@@ -210,8 +198,8 @@ Resources:
210198
Condition: CreateSNSEndpoint
211199
Properties:
212200
VpcId: !Ref vpcId
213-
ServiceName: !Sub "com.amazonaws.${snsTopicRegion}.sns"
214-
VpcEndpointType: Interface
201+
ServiceName: !Sub "com.amazonaws.${AWS::Region}.sns"
202+
VpcEndpointType: 'Interface'
215203
PrivateDnsEnabled: true
216204
SubnetIds: !Ref subNetIds
217205

@@ -220,10 +208,9 @@ Resources:
220208
Condition: CreateS3Endpoint
221209
Properties:
222210
VpcId: !Ref vpcId
223-
ServiceName: !Sub "com.amazonaws.${s3BucketRegion}.s3"
224-
VpcEndpointType: Gateway
225-
PrivateDnsEnabled: true
226-
SubnetIds: !Ref subNetIds
211+
ServiceName: !Sub "com.amazonaws.${AWS::Region}.s3"
212+
VpcEndpointType: 'Gateway'
213+
RouteTableIds: !Ref routeTableIds
227214

228215
s3Bucket:
229216
Type: "AWS::S3::Bucket"
@@ -320,7 +307,7 @@ Resources:
320307
Timeout: 60
321308
Environment:
322309
Variables:
323-
s3BucketRegion: !Ref s3BucketRegion
310+
s3BucketRegion: !Ref AWS::Region
324311
s3BucketArn: !GetAtt s3Bucket.Arn
325312
OntapAdminServer: !Ref OntapAdminSever
326313
secretArn: !Ref secretArn
@@ -363,8 +350,8 @@ Resources:
363350
# "matching conditions." It is intended to be run as a Lambda function, but
364351
# can be run as a standalone program.
365352
#
366-
# Version: %%VERSION%%
367-
# Date: %%DATE%%
353+
# Version: v2.5
354+
# Date: 2024-08-20-17:54:55
368355
################################################################################
369356
370357
import json
Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,35 @@
11
#!/bin/bash
2+
#
3+
# This script is used to update the CloudFormation template with the latest
4+
# version of the Lambda function. It will also update the version number in
5+
# the template as well as create a git tag for the version.
6+
#################################################################################
7+
8+
majorVersionNum=2
9+
file="monitor_ontap_services.py"
10+
#
11+
# Get the number of commits in the git history for the file to calculate the minor version number.
12+
minorVersionNum="$(git log "$file" | egrep '^commit' | wc -l)"
13+
if [ -z "$minorVersionNum" ]; then
14+
echo "Failed to calculate version number." 1>&2
15+
exit 1
16+
fi
17+
18+
version="v${majorVersionNum}.${minorVersionNum}"
219

320
sed -e '/ZipFile/,$d' cloudformation.yaml > cloudformation.yaml.tmp
421
echo " ZipFile: |" >> cloudformation.yaml.tmp
5-
cat monitor_ontap_services.py | sed 's/^/ /' >> cloudformation.yaml.tmp
22+
cat "$file" | sed -e 's/^/ /' -e "s/%%VERSION%%/${version}/" -e "s/%%DATE%%/$(date +%Y-%m-%d-%H:%M:%S)/" >> cloudformation.yaml.tmp
623
if diff cloudformation.yaml cloudformation.yaml.tmp; then
724
rm -f cloudformation.yaml.tmp
825
echo "No changes detected"
926
exit 0
1027
fi
28+
#
29+
# Create a tag in git.
30+
latestHash=$(git log $file | head -1 | awk '{print $2}')
31+
echo "Creating tag ${file}-${version} for commit $latestHash"
32+
git tag -a "${file}-${version}" -m "${file}-${version}" $latestHash
33+
1134
echo "Updating cloudformation.yaml"
1235
mv cloudformation.yaml.tmp cloudformation.yaml

0 commit comments

Comments
 (0)