Analytics

The analytics component

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Data

Applications/year=2019/month=10/day=11/hour=16/realworld-serverless-application-analytic-Firehose-12J7YC29T8FAY-1-2019-10-11-16-58-58-c0068baf-ab5b-4a61-a9a7-1e100983c696.parquet

Data is partitioned by time. The year=2019 style of prefixes can be Partitioning reduces the amount of data that has to be scanned to execute Athena queries, thus reducing the cost.

Data is stored in .parquet files Parquet is a columnar data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and enhanced performance to handle complex data in bulk. Parquet files are drastically smaller than JSON text files. Using parquet reduces storage and query costs

Athena

Run the following query first to load new partitions:

MSCK REPAIR TABLE applications;

Run a sample query:

SELECT detail.eventname,
         detail.dynamodb.keys.applicationid.s AS applicationid,
         detail.dynamodb.keys.userid.s AS userid,
         detail.dynamodb.newimage.author.s AS author,
         detail.dynamodb.newimage.description.s AS description
FROM applications;

Content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analytics

Data

Athena

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally