-
Notifications
You must be signed in to change notification settings - Fork 0
processing big data
abk edited this page Jul 27, 2020
·
1 revision
Processing big data
- AWS Lambda
- Serverless data processing tool that you can use
- Way to run code snippets in the cloud.
- Serverless
- Continuous scaling
- Often used to process data as it's move around.
- Lambda is typically used as glue between data stream and dynamoDB.
- Examples
- Transaction rate alarm. Transform the data in anyway and notify and acts like magic glue.
- Lambda Integration (part 1).
- Why not just run a server
- Server management (patches, monitoring etc)
- Servers can be cheap but scaling gets expensive.
- You don't pay for processing time you don't use
- Easier to split up development between front end and back end.
- Main uses of lambda
- Real time file processing
- Real time stream processing
- ETL
- Cron replacement (Use time as trigger for lambda invocation). Calling lambda function periodically.
- Process AWS events.
- There are many lambda triggers, noteworthy are.
- S3, kinesis, dynamodb, IOT
- Kinesis stream is NOT pushing the stream into lambda. Lambda polls that data and PULLs the data into lambda.
- Why not just run a server
- Lambda integration (part 2).
-
Lambda and Elastisearch service.
- S3 > AWS Lambda > Elastisearch service (process and analyst).
- S3 > AWS Lambda > AWS data pipeline (process the data further after it's activated by Lambda). You can schedule activities in datapipeline., but with lambda we can invoke anytime instead of fixed timeline.
- S3 > AWS Lambda > RedShift ^ | V DynamoDB. Lambda has to be stateless. Hence store the state in DynamoDB.
-
Lambda + Kinesis
- Lambda receives an event with batch of stream records.
- You specify batch size when setting up the tigger.
- Too large batch size can cause timeouts
- Batches can be split beyond Lambda's payload limit (6MB)
- If lambda fails, it will timeout and do retry
- This can stall the shard if you don't handle errors properly.
- User more shards to ensure processing is NOT holdup by errors.
- Lambda processes shard data async.
- Lambda receives an event with batch of stream records.
-
- Lambda Anti patterns (Where you don't want to use Lambda).
- Long running applications
- Dynamic Websites
- Stateful applications.