-
Notifications
You must be signed in to change notification settings - Fork 112
Analytics
The analytics component
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Applications/year=2019/month=10/day=11/hour=16/realworld-serverless-application-analytic-Firehose-12J7YC29T8FAY-1-2019-10-11-16-58-58-c0068baf-ab5b-4a61-a9a7-1e100983c696.parquet
Data is partitioned by time.
The year=2019
style of prefixes can be
Partitioning reduces the amount of data that has to be scanned to execute Athena queries, thus reducing the cost.
Data is stored in .parquet files Parquet is a columnar data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and enhanced performance to handle complex data in bulk. Parquet files are drastically smaller than JSON text files. Using parquet reduces storage and query costs
Run the following query first to load new partitions:
MSCK REPAIR TABLE applications;
Run a sample query:
SELECT detail.eventname,
detail.dynamodb.keys.applicationid.s AS applicationid,
detail.dynamodb.keys.userid.s AS userid,
detail.dynamodb.newimage.author.s AS author,
detail.dynamodb.newimage.description.s AS description
FROM applications;