Skip to content
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ assets/cloudwatch-dashboard.rendered.json
samconfig.toml
.aws-sam
.env.local.json
events/my.event.json
events/my.event.json
lambda/tests/.pytest_cache
lambda/tests/test_db
lambda/tests/__pycache__
lambda/__pycache__
33 changes: 25 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ This repository provides you with a sample solution that collects metrics of exi
### Solution Tenets
* Solution is designed to provide time-series metrics for Apache Iceberg to monitor Apache Iceberg tables over-time to recognize trends and anomalies.
* Solution is designed to be lightweight and collect metrics exclusively from Apache Iceberg metadata layer without scanning the data layer hense without the need for heavy compute capacity.
* In the future we strive to reduce the dependency on AWS Glue in favor of using AWS Lambda compute when required features are available in [PyIceberg](https://py.iceberg.apache.org) library.

### Technical implementation

Expand Down Expand Up @@ -90,7 +89,7 @@ https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/i

### Build and Deploy

> ! Important - The guidance below uses AWS Serverless Application Model (SAM) for easier packaging and deployment of AWS Lambda. However if you use your own packaging tool or if you want to deploy AWS Lambda manually you can explore following files:
> ! Important - The guidance below uses AWS Serverless Application Model (SAM) and Amazon ECR for easier packaging and deployment of AWS Lambda. However if you use your own packaging tool or if you want to deploy AWS Lambda manually you can explore following files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ECR for lambda packaging block looks good, but I would make it more easier for developers by suggesting bash env vars before the commands, like so:

export CLOUDWATCH_NAMESPACE={{ cw_namespace }}
export AWS_REGION={{ aws_region }}
export aws_account_id={{ aws_account_id }}
export ecr_repository_name={{ repository_name }}
export STACK_NAME={{ your stack name }}
export S3_ARTIFACTS_BUCKET_NAME={{ s3_bucket_name }}
export S3_ARTIFACTS_PATH={{ s3_bucket_path }}
export ecr_repository_uri=${aws_account_id}.dkr.ecr.$AWS_REGION.amazonaws.com/${ecr_repository_name}

Once defined those let them just run the code

docker build -f Dockerfile --platform linux/amd64 -t ${ecr_repository_name}:main --build-arg CLOUDWATCH_NAMESPACE=$CLOUDWATCH_NAMESPACE .
sam build --use-container
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin ${aws_account_id}.dkr.ecr.us-east-1.amazonaws.com
aws ecr create-repository --repository-name $ecr_repository_name --region $AWS_REGION --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE
docker tag ${ecr_repository_name}:main ${ecr_repository_uri}:latest
docker push ${ecr_repository_uri}:latest

sam deploy --debug --region $AWS_REGION \
        --parameter-overrides ImageURL=${ecr_repository_uri}:latest \
        --image-repository $ecr_repository_uri \
        --stack-name $STACK_NAME --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND \
        --s3-bucket $S3_ARTIFACTS_BUCKET_NAME --s3-prefix $S3_ARTIFACTS_PATH

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moryachok - done I think :)

> - template.yaml
> - lambda/requirements.txt
> - lambda/app.py
Expand All @@ -100,22 +99,30 @@ https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/i
Once you've installed [Docker](#install-docker) and [SAM CLI](#install-sam-cli) you are ready to build the AWS Lambda. Open your terminal and run command below.

```bash
cd lambda
docker build -f Dockerfile --platform linux/amd64 -t iceberg-monitoring:main --build-arg CLOUDWATCH_NAMESPACE={{ cw_namespace }} .
sam build --use-container
```

#### 2. Deploy AWS Lambda using AWS SAM CLI
#### 2. Deploy AWS Lambda using AWS SAM CLI and Amazon ECR

Once build is finished you can deploy your AWS Lambda. SAM will upload packaged code and deploy AWS Lambda resource using AWS CloudFormation. Run below command using your terminal.
Once build is finished you can deploy your AWS Lambda. ECR will upload packaged code and SAM will deploy AWS Lambda resource using AWS CloudFormation. Run below command using your terminal.

```bash
sam deploy --guided
aws ecr get-login-password --region {{ aws_region }} | docker login --username AWS --password-stdin {{ aws_account_id }}.dkr.ecr.us-east-1.amazonaws.com
aws ecr create-repository --repository-name iceberg-monitoring --region {{ aws_region }} --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE
docker tag iceberg-monitoring:main {{ ecr_repository_uri }}:latest
docker push {{ aws_account_id }}.dkr.ecr.{{ aws_region }}.amazonaws.com/iceberg-monitoring:latest
sam deploy --debug --region {{ aws_region }} \
Copy link
Contributor

@moryachok moryachok Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for the very first deploy you have to add --guided attribute

--parameter-overrides ImageURL={{ aws_account_id }}.dkr.ecr.{{ aws_region }}.amazonaws.com/iceberg-monitoring:latest \
--image-repository {{ aws_account_id }}.dkr.ecr.{{ aws_region }}.amazonaws.com/iceberg-monitoring \
--stack-name iceberg-monitoring --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND \
--s3-bucket {{ s3_bucket }} --s3-prefix iceberg-monitoring
```

##### Parameters

- `CWNamespace` - A namespace is a container for CloudWatch metrics.
- `GlueServiceRole` - AWS Glue Role arn you created [earlier](#configuring-iam-permissions-for-aws-glue).
- `Warehouse` - Required catalog property to determine the root path of the data warehouse on S3. This can be any path on your S3 bucket. Not critical for the solution.
- `CLOUDWATCH_NAMESPACE` - A namespace is a container for CloudWatch metrics.


#### 3. Configure EventBridge Trigger
Expand Down Expand Up @@ -235,6 +242,16 @@ sam local invoke IcebergMetricsLambda --env-vars .env.local.json
`.env.local.json` - The JSON file that contains values for the Lambda function's environment variables. Lambda code is dependent on env vars that you are passing in the deploy section. You need to create the file it and include relevant [parameters](#parameters) before you calling `sam local invoke`.


### Unit Tests

You can test the metrics generation locally through unit-tests. From lambda folder -

```bash
cd lambda
docker build -f tests/Dockerfile -t iceberg-metrics-tests .
docker run --rm iceberg-metrics-tests
```

## Dependencies

PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM. \
Expand Down
Binary file modified assets/arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions lambda/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM public.ecr.aws/lambda/python:3.10

COPY . ${LAMBDA_TASK_ROOT}

# Install the function's dependencies
RUN pip install --upgrade pip && \
pip install -r requirements.txt

ARG CLOUDWATCH_NAMESPACE
ENV CW_NAMESPACE=$CLOUDWATCH_NAMESPACE

CMD [ "app.lambda_handler" ]
Loading