Skip to content

akshaypunwatkar/Serverless-Data-Engineering-Pipeline

Repository files navigation

Serverless-Data-Engineering-Pipeline

Serverless data engineering pipeline on AWS, using S3, DynamoDB, SQS and Lambda functions

Architecture of the serverless Data Engineering Pipeline

Markdown Monster icon

Workflow explained : Youtube Link

  • We will start with a CVS file, which we will upload in a S3 bucket. The first lambda function, s3_to_dynaomdb_lambda.py will get triggered when we make the upload. The function will read the csv file and write the data to a already created table in DynamoDB.

  • Once the data is uploaded, we will move on to the next step. In the next step, we will use another lambda function, dynamodb_to_sqs_lambda.py to read the data from the table in DynamoDB and send it to SQS, which is a messaging service that stores messages/event for consumption by other aws services. This lambda will be triggered using Cloudwatch events, which will get triggered after certain interval (In this case 1 minute).

  • Once the data gets into SQS, we will move on to the third and the final part. In this step, lambda function, sqs_sentiment_to_s3_lambda.py, will get the messages from the SQS queue, read them and send to Amazon Comprehend for sentiment analysis. The analyzed sentiment, along the the original review and id will be saved in csv file in S3 bucket.

Reference Links:

Amazon Web Service (AWS) Amazon S3
AWS Lambda
Amazon DynamoDB
Amazon Cloudwatch
Amazon Simple Queue Service (SQS)
AWS Comprehend.

About

Serverless data engineering pipeline on AWS, using S3, DynamoDB, SQS and Lambda functions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages