Skip to content

Commit 40e8438

Browse files
authored
Merge pull request #223 from NetApp/add_multiple_vserver_support
Added support for multiple vservers
2 parents 85ef120 + d368a14 commit 40e8438

File tree

2 files changed

+91
-55
lines changed

2 files changed

+91
-55
lines changed

Monitoring/ingest_nas_audit_logs_into_cloudwatch/README.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@
33
## Overview
44
This sample demonstrates a way to ingest the NAS audit logs from an FSx for Data ONTAP file system into a CloudWatch log group
55
without having to NFS or CIFS mount a volume to access them.
6-
It will attempt to gather the audit logs from all the FSx for Data ONTAP file systems that are within a specified region.
6+
It will attempt to gather the audit logs from all the SVMs within all the FSx for Data ONTAP file systems that are within a specified region.
77
It will skip any file systems where the credentials aren't provided in the supplied AWS SecretManager's secret, or that do not have
88
the appropriate NAS auditing configuration enabled.
99
It will maintain a "stats" file in an S3 bucket that will keep track of the last time it successfully ingested audit logs from each
10-
file system to try to ensure it doesn't process an audit file more than once.
10+
SVM to try to ensure it doesn't process an audit file more than once.
1111
You can run this script as a standalone program or as a Lambda function. These directions assume you are going to run it as a Lambda function.
1212

1313
## Prerequisites
1414
- An FSx for Data ONTAP file system.
1515
- An S3 bucket to store the "stats" file. The "stats" file is used to keep track of the last time the Lambda function successfully
16-
ingested audit logs from each file system. Its size will be small (i.e. less than a few megabytes).
17-
- Have NAS auditing configured and enabled on the FSx for Data ONTAP file system. **Ensure you have selected the XML format for the audit logs.** Also,
16+
ingested audit logs from each SVM. Its size will be small (i.e. less than a few megabytes).
17+
- Have NAS auditing configured and enabled on the SVM within a FSx for Data ONTAP file system. **Ensure you have selected the XML format for the audit logs.** Also,
1818
ensure you have set up a rotation schedule. The program will only act on audit log files that have been finalized, and not the "active" one. You can read this
1919
[knowledge based article](https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/How_to_set_up_NAS_auditing_in_ONTAP_9) for instructions on how to setup NAS auditing.
20-
- Have the NAS auditing configured to store the audit logs in a volume with the same name on all FSx for Data ONTAP file
20+
- Have the NAS auditing configured to store the audit logs in a volume of the same name in all SVMs on all the FSx for Data ONTAP file
2121
systems that you want to ingest the audit logs from.
2222
- A CloudWatch log group.
2323
- An AWS Secrets Manager secret that contains the passwords for the fsxadmin account for all the FSx for Data ONTAP file systems you want to gather audit logs from.
@@ -29,8 +29,8 @@ systems that you want to ingest the audit logs from.
2929
}
3030
```
3131
- You have applied the necessary SACLs to the files you want to audit. The knowledge base article linked above provides guidance on how to do this.
32-
- Since the Lambda function runs within your VPC it will not have access to the Internet, even if you can access the Internet from the Subnet it run from.
33-
Therefore, there needs to be an VPC endpoint for all the AWS services that the Lambda function uses. Specifically, the Lambda function needs to be able to access the following services:
32+
- Since the Lambda function runs within your VPC it will not have access to the Internet, even if you can access the Internet from the Subnet it runs from.
33+
Therefore, there needs to be an VPC endpoint for all the AWS services that the Lambda function uses. Specifically, the Lambda function needs to be able to access the following AWS services:
3434
- FSx.
3535
- Secrets Manager.
3636
- CloudWatch Logs.
@@ -82,7 +82,7 @@ and `DeleteNetworkInterface` actions. The correct resource line is `arn:aws:ec2:
8282
file system management endpoints that you want to gather audit logs from. Also, select a Security Group that allows TCP port 443 outbound.
8383
Inbound rules don't matter since the Lambda function is not accessible from a network.
8484
1. Click `Create Function` and on the next page, under the `Code` tab, select `Upload From -> .zip file.` Provide the .zip file created by the steps above.
85-
1. From the `Configuration -> General` tab set the timeout to at least 30 seconds. You will may need to increase that if it has to process a lot of audit entries and/or process a lot of FSx for ONTAP file systems.
85+
1. From the `Configuration -> General` tab set the timeout to at least 30 seconds. You will may need to increase that if it has to process a lot of audit entries and/or process a lot of SVMs.
8686

8787
3. Configure the Lambda function by setting the following environment variables. For a Lambda function you do this by clicking on the `Configuration` tab and then the `Environment variables` sub tab.
8888

@@ -96,13 +96,12 @@ Inbound rules don't matter since the Lambda function is not accessible from a ne
9696
| statsName | The name you want to use as the stats file. |
9797
| logGroupName | The name of the CloudWatch log group to ingest the audit logs into. |
9898
| volumeName | The name of the volume, on all the FSx for ONTAP file systems, where the audit logs are stored. |
99-
| vserverName | The name of the vserver, on all the FSx for ONTAP file systems, where the audit logs are stored. |
10099

101100
4. Test the Lambda function by clicking on the `Test` tab and then clicking on the `Test` button. You should see "Executing function: succeeded".
102101
If not, click on the "Details" button to see what errors there are.
103102

104103
5. After you have tested that the Ladmba function is running correctly, add an EventBridge trigger to have it run periodically.
105-
You can do this by clicking on the `Add Trigger` button within the AWS console and selecting `EventBridge (CloudWatch Events)`
104+
You can do this by clicking on the `Add Trigger` button within the AWS console on the Lambda page and selecting `EventBridge (CloudWatch Events)`
106105
from the dropdown. You can then configure the schedule to run as often as you want. How often depends on how often you have
107106
set up your FSx for ONTAP file systems to generate audit logs, and how up-to-date you want the CloudWatch logs to be.
108107

Monitoring/ingest_nas_audit_logs_into_cloudwatch/ingest_audit_log.py

Lines changed: 82 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,10 @@
1212
# system that doesn't have the specified volume.
1313
#
1414
# It assumes:
15-
# - That there is only one data vserver per FSxN file system and that it
16-
# is named 'fsx'.
1715
# - That the administrator username is 'fsxadmin'.
1816
# - That the audit log files will be named in the following format:
19-
# audit_fsx_D2024-09-24-T13-00-03_0000000000.xml
20-
# Where 'fsx' is the vserver name.
17+
# audit_vserver_D2024-09-24-T13-00-03_0000000000.xml
18+
# Where 'vserver' is the vserver name.
2119
#
2220
################################################################################
2321
#
@@ -54,8 +52,12 @@
5452
# all FSxNs.
5553
#volumeName = "audit_logs"
5654
#
57-
# The name of the vserver that holds the audit logs. Assumed to be the same on
55+
# The name of the vserver that holds the audit logs. Assumed to be the same on
5856
# all FSxNs.
57+
# *NOTE*:The program has been updated to loop on all the vservers within an FSxN
58+
# filesystem and not just the one set here. This variable is now used
59+
# so it can update the lastFireRead stats file to conform to the new format
60+
# that includes the vserver as part of the structure.
5961
#vserverName = "fsx"
6062
#
6163
# The CloudWatch log group to store the audit logs in.
@@ -118,7 +120,7 @@ def processFile(ontapAdminServer, headers, volumeUUID, filePath):
118120
else:
119121
f.write(part.content)
120122
else:
121-
print(f'API call to {endpoint} failed. HTTP status code: {response.status}.')
123+
print(f'Warning: API call to {endpoint} failed. HTTP status code: {response.status}.')
122124
break
123125

124126
f.close()
@@ -204,7 +206,7 @@ def ingestAuditFile(auditLogPath, auditLogName):
204206
dictData = xmltodict.parse(data)
205207

206208
if dictData.get('Events') == None or dictData['Events'].get('Event') == None:
207-
print(f"No events found in {auditLogName}")
209+
print(f"Info: No events found in {auditLogName}.")
208210
return
209211
#
210212
# Ensure the logstream exists.
@@ -214,7 +216,7 @@ def ingestAuditFile(auditLogPath, auditLogName):
214216
#
215217
# This really shouldn't happen, since we should only be processing
216218
# each file once, but during testing it happens all the time.
217-
print(f"Log stream {auditLogName} already exists")
219+
print(f"Info: Log stream {auditLogName} already exists.")
218220
#
219221
# If there is only one event, then the dict['Events']['Event'] will be a
220222
# dictionary, otherwise it will be a list of dictionaries.
@@ -223,25 +225,25 @@ def ingestAuditFile(auditLogPath, auditLogName):
223225
for event in dictData['Events']['Event']:
224226
cwEvents.append(createCWEvent(event))
225227
if len(cwEvents) == 5000: # The real maximum is 10000 events, but there is also a size limit, so we will use 5000.
226-
print("Putting 5000 events")
228+
print("Info: Putting 5000 events")
227229
response = cwLogsClient.put_log_events(logGroupName=config['logGroupName'], logStreamName=auditLogName, logEvents=cwEvents)
228230
if response.get('rejectedLogEventsInfo') != None:
229-
if response['rejectedLogEventsInfo'].get('tooNewLogEventStartIndex') > 0:
231+
if response['rejectedLogEventsInfo'].get('tooNewLogEventStartIndex') is not None:
230232
print(f"Warning: Too new log event start index: {response['rejectedLogEventsInfo']['tooNewLogEventStartIndex']}")
231-
if response['rejectedLogEventsInfo'].get('tooOldLogEventStartIndex') > 0:
232-
print(f"Warning: Too old log event start index: {response['rejectedLogEventsInfo']['tooOldLogEventStartIndex']}")
233+
if response['rejectedLogEventsInfo'].get('tooOldLogEventEndIndex') is not None:
234+
print(f"Warning: Too old log event end index: {response['rejectedLogEventsInfo']['tooOldLogEventEndIndex']}")
233235
cwEvents = []
234236
else:
235237
cwEvents = [createCWEvent(dictData['Events']['Event'])]
236238

237239
if len(cwEvents) > 0:
238-
print(f"Putting {len(cwEvents)} events")
240+
print(f"Info: Putting {len(cwEvents)} events")
239241
response = cwLogsClient.put_log_events(logGroupName=config['logGroupName'], logStreamName=auditLogName, logEvents=cwEvents)
240242
if response.get('rejectedLogEventsInfo') != None:
241-
if response['rejectedLogEventsInfo'].get('tooNewLogEventStartIndex') > 0:
243+
if response['rejectedLogEventsInfo'].get('tooNewLogEventStartIndex') is not None:
242244
print(f"Warning: Too new log event start index: {response['rejectedLogEventsInfo']['tooNewLogEventStartIndex']}")
243-
if response['rejectedLogEventsInfo'].get('tooOldLogEventStartIndex') > 0:
244-
print(f"Warning: Too old log event start index: {response['rejectedLogEventsInfo']['tooOldLogEventStartIndex']}")
245+
if response['rejectedLogEventsInfo'].get('tooOldLogEventEndIndex') is not None:
246+
print(f"Warning: Too old log event end index: {response['rejectedLogEventsInfo']['tooOldLogEventEndIndex']}")
245247

246248
################################################################################
247249
# This function checks that all the required configuration variables are set.
@@ -257,15 +259,17 @@ def checkConfig():
257259
'secretArn': secretArn if 'secretArn' in globals() else None, # pylint: disable=E0602
258260
's3BucketRegion': s3BucketRegion if 's3BucketRegion' in globals() else None, # pylint: disable=E0602
259261
's3BucketName': s3BucketName if 's3BucketName' in globals() else None, # pylint: disable=E0602
260-
'statsName': statsName if 'statsName' in globals() else None, # pylint: disable=E0602
261-
'vserverName': vserverName if 'vserverName' in globals() else None # pylint: disable=E0602
262+
'statsName': statsName if 'statsName' in globals() else None # pylint: disable=E0602
262263
}
263264

264265
for item in config:
265266
if config[item] == None:
266267
config[item] = os.environ.get(item)
267268
if config[item] == None:
268269
raise Exception(f"{item} is not set.")
270+
#
271+
# To be backwards compatible, load the vserverName.
272+
config['vserverName'] = vserverName if 'vserverName' in globals() else os.environ.get('vserverName') # pylint: disable=E0602
269273

270274
################################################################################
271275
# This is the main function that checks that everything is configured correctly
@@ -330,6 +334,11 @@ def lambda_handler(event, context): # pylint: disable=W0613
330334
for fsxn in fsxNs:
331335
fsId = fsxn.split('.')[1]
332336
#
337+
# Since the format of the lastReadFile sttucture has changed, we need to update it.
338+
if lastFileRead.get(fsxn) is not None and config['vserverName'] is not None:
339+
if type(lastFileRead[fsxn]) is float: # Old format
340+
lastFileRead[fsxn] = {config['vserverName']: lastFileRead[fsxn]} # New format
341+
#
333342
# Get the password
334343
password = secrets.get(fsId)
335344
if password == None:
@@ -341,39 +350,67 @@ def lambda_handler(event, context): # pylint: disable=W0613
341350
headersDownload = { **auth, 'Accept': 'multipart/form-data' }
342351
headersQuery = { **auth }
343352
#
344-
# Get the volume UUID for the audit_logs volume.
345-
volumeUUID = None
346-
endpoint = f"https://{fsxn}/api/storage/volumes?name={config['volumeName']}&svm={config['vserverName']}"
353+
# Get the list of SVMs on the FSxN.
354+
endpoint = f"https://{fsxn}/api/svm/svms?return_timeout=4"
347355
response = http.request('GET', endpoint, headers=headersQuery, timeout=5.0)
348356
if response.status == 200:
349-
data = json.loads(response.data.decode('utf-8'))
350-
if data['num_records'] > 0:
351-
volumeUUID = data['records'][0]['uuid'] # Since we specified the volume, and vserver name, there should only be one record.
357+
svmsData = json.loads(response.data.decode('utf-8'))
358+
numSvms = svmsData['num_records']
359+
#
360+
# Loop over all the SVMs.
361+
while numSvms > 0:
362+
for record in svmsData['records']:
363+
vserverName = record['name']
364+
#
365+
# Get the volume UUID for the audit_logs volume.
366+
volumeUUID = None
367+
endpoint = f"https://{fsxn}/api/storage/volumes?name={config['volumeName']}&svm={vserverName}"
368+
response = http.request('GET', endpoint, headers=headersQuery, timeout=5.0)
369+
if response.status == 200:
370+
data = json.loads(response.data.decode('utf-8'))
371+
if data['num_records'] > 0:
372+
volumeUUID = data['records'][0]['uuid'] # Since we specified the volume, and vserver name, there should only be one record.
352373

353-
if volumeUUID == None:
354-
print(f"Warning: Volume {config['volumeName']} not found for {fsId} under SVM: {config['vserverName']}.")
355-
continue
356-
#
357-
# Get all the files in the volume that match the audit file pattern.
358-
endpoint = f"https://{fsxn}/api/storage/volumes/{volumeUUID}/files?name=audit_{config['vserverName']}_D*.xml&order_by=name%20asc&fields=name"
359-
response = http.request('GET', endpoint, headers=headersQuery, timeout=5.0)
360-
data = json.loads(response.data.decode('utf-8'))
361-
if data.get('num_records') == 0:
362-
print(f"Warning: No XML audit log files found on FsID: {fsId}; SvmID: {config['vserverName']}; Volume: {config['volumeName']}.")
363-
continue
374+
if volumeUUID == None:
375+
print(f"Warning: Volume {config['volumeName']} not found for {fsId} under SVM: {vserverName}.")
376+
continue
377+
#
378+
# Get all the files in the volume that match the audit file pattern.
379+
endpoint = f"https://{fsxn}/api/storage/volumes/{volumeUUID}/files?name=audit_{vserverName}_D*.xml&order_by=name%20asc&fields=name"
380+
response = http.request('GET', endpoint, headers=headersQuery, timeout=5.0)
381+
data = json.loads(response.data.decode('utf-8'))
382+
if data.get('num_records') == 0:
383+
print(f"Warning: No XML audit log files found on FsID: {fsId}; SvmID: {vserverName}; Volume: {config['volumeName']}.")
384+
continue
364385

365-
for file in data['records']:
366-
filePath = file['name']
367-
if lastFileRead.get(fsxn) == None or getEpoch(filePath) > lastFileRead[fsxn]:
386+
for file in data['records']:
387+
filePath = file['name']
388+
if lastFileRead.get(fsxn) is None or lastFileRead[fsxn].get(vserverName) is None or getEpoch(filePath) > lastFileRead[fsxn][vserverName]:
389+
#
390+
# Process the file.
391+
processFile(fsxn, headersDownload, volumeUUID, filePath)
392+
if lastFileRead.get(fsxn) is None:
393+
lastFileRead[fsxn] = {vserverName: getEpoch(filePath)}
394+
else:
395+
lastFileRead[fsxn][vserverName] = getEpoch(filePath)
396+
s3Client.put_object(Key=config['statsName'], Bucket=config['s3BucketName'], Body=json.dumps(lastFileRead).encode('UTF-8'))
368397
#
369-
# Process the file.
370-
processFile(fsxn, headersDownload, volumeUUID, filePath)
371-
lastFileRead[fsxn] = getEpoch(filePath)
372-
s3Client.put_object(Key=config['statsName'], Bucket=config['s3BucketName'], Body=json.dumps(lastFileRead).encode('UTF-8'))
398+
# Get the next set of SVMs.
399+
if svmsData['_links'].get('next') != None:
400+
endpoint = f"https://{fsxn}{svmsData['_links']['next']['href']}"
401+
response = http.request('GET', endpoint, headers=headersQuery, timeout=5.0)
402+
if response.status == 200:
403+
svmsData = json.loads(response.data.decode('utf-8'))
404+
numSvms = svmsData['num_records']
405+
else:
406+
print(f"Warning: API call to {endpoint} failed. HTTP status code: {response.status}.")
407+
break # Break out of the for all SVMs loop. Maybe the call to the next FSxN will work.
408+
else:
409+
numSvms = 0
410+
else:
411+
print(f"Warning: API call to {endpoint} failed. HTTP status code: {response.status}.")
412+
break # Break out of the for all FSxNs loop.
373413
#
374414
# If this script is not running as a Lambda function, then call the lambda_handler function.
375415
if os.environ.get('AWS_LAMBDA_FUNCTION_NAME') == None:
376-
lambdaFunction = False
377416
lambda_handler(None, None)
378-
else:
379-
lambdaFunction = True

0 commit comments

Comments
 (0)