Skip to content

Commit 97b9371

Browse files
authored
Crawler settings (#157)
* typos * fix crawler * fix crawler * fix crawler * minor fixes and adding test option * lint
1 parent e35d7ff commit 97b9371

File tree

5 files changed

+40
-9
lines changed

5 files changed

+40
-9
lines changed

data-collection/CONTRIBUTING.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,10 @@ python3 ./data-collection/utils/pylint.py
6969
3. Upload the code to a bucket and run integration tests in your testing environment
7070

7171
```bash
72-
export account_id=$(aws sts get-caller-identity --query "Account" --output text )
73-
export bucket=cid-$account_id-test
74-
./data-collection/utils/upload.sh "$bucket"
75-
python3 ./data-collection/test/test_from_scratch.py
72+
./data-collection/test/run-test-from-scratch.sh --no-teardown
7673
```
7774

78-
The test will install stacks from scratch in a single account, then it will check the presence of Athena tables. After running tests, it will delete the stacks and all artefacts that are not deleted by CFN.
75+
The test will install stacks from scratch in a single account, then it will check the presence of Athena tables. After running tests, it will delete the stacks and all artifacts that are not deleted by CFN. You can avoid teardown by providing a flag `--no-teardown`.
7976

8077
4. Create a merge request.
8178

data-collection/deploy/module-inventory.yaml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Mappings:
6565
path: opensearch-domains
6666
table:
6767
- Name: inventory_opensearch_domains_data
68+
Parameters: { "classification" : "json", "compressionType": "none" }
6869
PartitionKeys:
6970
- Name: payer_id
7071
Type: string
@@ -105,6 +106,7 @@ Mappings:
105106
path: elasticache-clusters
106107
table:
107108
- Name: inventory_elasticache_clusters_data
109+
Parameters: { "classification" : "json", "compressionType": "none" }
108110
PartitionKeys:
109111
- Name: payer_id
110112
Type: string
@@ -192,7 +194,7 @@ Mappings:
192194
table:
193195
- Name: inventory_rds_db_instances_data
194196
TableType: EXTERNAL_TABLE
195-
197+
Parameters: { "classification" : "json", "compressionType": "none" }
196198
PartitionKeys:
197199
- { Name: payer_id, Type: string }
198200
- { Name: year, Type: string }
@@ -334,6 +336,7 @@ Mappings:
334336
path: rds-db-snapshots
335337
table:
336338
- Name: inventory_rds_db_snapshots_data
339+
Parameters: { "classification" : "json", "compressionType": "none" }
337340
PartitionKeys:
338341
- Name: payer_id
339342
Type: string
@@ -422,6 +425,7 @@ Mappings:
422425
path: ebs
423426
table:
424427
- Name: inventory_ebs_data
428+
Parameters: { "classification" : "json", "compressionType": "none" }
425429
PartitionKeys:
426430
- Name: payer_id
427431
Type: string
@@ -478,6 +482,7 @@ Mappings:
478482
path: ami
479483
table:
480484
- Name: inventory_ami_data
485+
Parameters: { "classification" : "json", "compressionType": "none" }
481486
PartitionKeys:
482487
- Name: payer_id
483488
Type: string
@@ -550,6 +555,7 @@ Mappings:
550555
path: snapshot
551556
table:
552557
- Name: inventory_snapshot_data
558+
Parameters: { "classification" : "json", "compressionType": "none" }
553559
PartitionKeys:
554560
- Name: payer_id
555561
Type: string
@@ -602,6 +608,7 @@ Mappings:
602608
path: ec2-instances
603609
table:
604610
- Name: inventory_ec2_instances_data
611+
Parameters: { "classification" : "json", "compressionType": "none" }
605612
PartitionKeys:
606613
- Name: payer_id
607614
Type: string
@@ -711,6 +718,7 @@ Mappings:
711718
path: vpc
712719
table:
713720
- Name: inventory_vpc_data
721+
Parameters: { "classification" : "json", "compressionType": "none" }
714722
PartitionKeys:
715723
- Name: payer_id
716724
Type: string
@@ -756,6 +764,7 @@ Mappings:
756764
path: eks
757765
table:
758766
- Name: inventory_eks_data
767+
Parameters: { "classification" : "json", "compressionType": "none" }
759768
PartitionKeys:
760769
- Name: payer_id
761770
Type: string
@@ -793,6 +802,7 @@ Mappings:
793802
path: lambda
794803
table:
795804
- Name: inventory_lambda_data
805+
Parameters: { "classification" : "json", "compressionType": "none" }
796806
PartitionKeys:
797807
- Name: payer_id
798808
Type: string
@@ -849,7 +859,7 @@ Mappings:
849859
- Name: region
850860
Type: string
851861
- Name: layers
852-
Type: array<struct<arn:string,codesize:int>>
862+
Type: array<struct<arn:string,codesize:int>> # will be updated
853863
- Name: vpcconfig
854864
Type: struct<subnetids:array<string>,securitygroupids:array<string>,vpcid:string,ipv6allowedfordualstack:boolean>
855865
InputFormat: org.apache.hadoop.mapred.TextInputFormat
@@ -1161,9 +1171,12 @@ Resources:
11611171
Configuration: |
11621172
{
11631173
"Version": 1.0,
1174+
"Grouping": {
1175+
"TableGroupingPolicy": "CombineCompatibleSchemas"
1176+
},
11641177
"CrawlerOutput": {
1165-
"Partitions": {
1166-
"AddOrUpdateBehavior": "InheritFromTable"
1178+
"Tables": {
1179+
"TableThreshold": 1
11671180
}
11681181
}
11691182
}

data-collection/test/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,5 +89,7 @@ def mode(request):
8989
@pytest.fixture(scope='session', autouse=True)
9090
def prepare_setup(athena, cloudformation, s3, s3client, account_id, org_unit_id, bucket, start_time, mode, glue):
9191
yield prepare_stacks(cloudformation=cloudformation, account_id=account_id, org_unit_id=org_unit_id, bucket=bucket, s3=s3, s3client=s3client)
92+
93+
mode = pytest.params.get('mode', mode)
9294
if mode != "no-teardown":
9395
cleanup_stacks(cloudformation=cloudformation, account_id=account_id, s3=s3, s3client=s3client, athena=athena, glue=glue)
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
# see ../CONTRIBUTION.md
3+
4+
# vars
5+
account_id=$(aws sts get-caller-identity --query "Account" --output text )
6+
bucket=cid-$account_id-test
7+
8+
# upload files
9+
./data-collection/utils/upload.sh "$bucket"
10+
11+
# run test
12+
python3 ./data-collection/test/test_from_scratch.py "$@"

data-collection/test/test_from_scratch.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
3232
"""
3333
import logging
34+
import sys
3435

3536
import pytest
3637

@@ -180,4 +181,10 @@ def test_license_manager_licenses(athena):
180181

181182

182183
if __name__ == '__main__':
184+
pytest.params = {}
185+
if '--no-teardown' in sys.argv:
186+
sys.argv.remove('--no-teardown')
187+
pytest.params['mode'] = 'no-teardown'
188+
189+
sys.argv = sys.argv[:1]
183190
pytest.main()

0 commit comments

Comments
 (0)