Skip to content

Commit c4bc91f

Browse files
committed
revamp athena docs
1 parent 71867c4 commit c4bc91f

File tree

1 file changed

+67
-55
lines changed

1 file changed

+67
-55
lines changed
Lines changed: 67 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
11
---
22
title: "Athena"
3-
linkTitle: "Athena"
43
description: Get started with Athena on LocalStack
54
tags: ["Ultimate"]
65
---
76

7+
import { Tabs, TabItem } from '@astrojs/starlight/components';
8+
89
## Introduction
910

1011
Athena is an interactive query service provided by Amazon Web Services (AWS) that enables you to analyze data stored in S3 using standard SQL queries.
1112
Athena allows users to create ad-hoc queries to perform data analysis, filter, aggregate, and join datasets stored in S3.
1213
It supports various file formats, such as JSON, Parquet, and CSV, making it compatible with a wide range of data sources.
1314

1415
LocalStack allows you to configure the Athena APIs with a Hive metastore that can connect to the S3 API and query your data directly in your local environment.
15-
The supported APIs are available on our [API coverage page]({{< ref "coverage_athena" >}}), which provides information on the extent of Athena's integration with LocalStack.
16+
The supported APIs are available on our [API coverage page](), which provides information on the extent of Athena's integration with LocalStack.
1617

1718
## Getting started
1819

@@ -21,44 +22,44 @@ This guide is designed for users new to Athena and assumes basic knowledge of th
2122
Start your LocalStack container using your preferred method.
2223
We will demonstrate how to create an Athena table and run a query against it in addition to reading the results with the AWS CLI.
2324

24-
{{< callout >}}
25+
:::note
2526
To utilize the Athena API, LocalStack will download additional dependencies.
2627
This involves getting a Docker image of around 1.5GB, containing Presto, Hive, and other tools.
2728
These components are retrieved automatically when you initiate the service.
2829
To ensure a smooth initial setup, ensure you're connected to a stable internet connection while fetching these components for the first time.
29-
{{< /callout >}}
30+
:::
3031

3132
### Create an S3 bucket
3233

3334
You can create an S3 bucket using the [`mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) command.
3435
Run the following command to create a bucket named `athena-bucket`:
3536

36-
{{< command >}}
37-
$ awslocal s3 mb s3://athena-bucket
38-
{{< / command >}}
37+
```bash
38+
awslocal s3 mb s3://athena-bucket
39+
```
3940

4041
You can create some sample data using the following commands:
4142

42-
{{< command >}}
43-
$ echo "Name,Service" > data.csv
44-
$ echo "LocalStack,Athena" >> data.csv
45-
{{< / command >}}
43+
```bash
44+
echo "Name,Service" > data.csv
45+
echo "LocalStack,Athena" >> data.csv
46+
```
4647

4748
You can upload the data to your bucket using the [`cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) command:
4849

49-
{{< command >}}
50-
$ awslocal s3 cp data.csv s3://athena-bucket/data/
51-
{{< / command >}}
50+
```bash
51+
awslocal s3 cp data.csv s3://athena-bucket/data/
52+
```
5253

5354
### Create an Athena table
5455

5556
You can create an Athena table using the [`CreateTable`](https://docs.aws.amazon.com/athena/latest/APIReference/API_CreateTable.html) API
5657
Run the following command to create a table named `athena_table`:
5758

58-
{{< command >}}
59-
$ awslocal athena start-query-execution \
59+
```bash
60+
awslocal athena start-query-execution \
6061
--query-string "create external table tbl01 (name STRING, surname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://athena-bucket/data/';" --result-configuration "OutputLocation=s3://athena-bucket/output/"
61-
{{< / command >}}
62+
```
6263

6364
The following output would be retrieved:
6465

@@ -71,9 +72,9 @@ The following output would be retrieved:
7172
You can retrieve information about the query execution using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API.
7273
Run the following command:
7374

74-
{{< command >}}
75-
$ awslocal athena get-query-execution --query-execution-id 593acab7
76-
{{< / command >}}
75+
```bash
76+
awslocal athena get-query-execution --query-execution-id 593acab7
77+
```
7778

7879
Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html) API.
7980

@@ -82,27 +83,27 @@ Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecu
8283
You can get the output of the query using the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API.
8384
Run the following command:
8485

85-
{{< command >}}
86-
$ awslocal athena get-query-results --query-execution-id 593acab7
87-
{{< / command >}}
86+
```bash
87+
awslocal athena get-query-results --query-execution-id 593acab7
88+
```
8889

8990
You can now read the data from the `tbl01` table and retrieve the data from S3 that was mentioned in your table creation statement.
9091
Run the following command:
9192

92-
{{< command >}}
93-
$ awslocal athena start-query-execution \
93+
```bash
94+
awslocal athena start-query-execution \
9495
--query-string "select * from tbl01;" --result-configuration "OutputLocation=s3://athena-bucket/output/"
95-
{{< / command >}}
96+
```
9697

9798
You can retrieve the execution details similarly using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API using the `QueryExecutionId` returned by the previous step.
9899

99100
You can copy the `ResultConfiguration` from the output and use it to retrieve the results of the query.
100101
Run the following command:
101102

102-
{{< command >}}
103-
$ awslocal cp s3://athena-bucket/output/593acab7.csv .
104-
$ cat 593acab7.csv
105-
{{< / command >}}
103+
```bash
104+
awslocal cp s3://athena-bucket/output/593acab7.csv .
105+
cat 593acab7.csv
106+
```
106107

107108
Replace `593acab7.csv` with the path to the file that was present in the `ResultConfiguration` of the previous step.
108109
You can also use the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API to retrieve the results of the query.
@@ -117,34 +118,37 @@ The Delta Lake files used in this sample are available in a public S3 bucket und
117118
For your convenience, we have prepared the test files in a downloadable ZIP file [here](https://localstack-assets.s3.amazonaws.com/aws-sample-athena-delta-lake.zip).
118119
We start by downloading and extracting this ZIP file:
119120

120-
{{< command >}}
121-
$ mkdir /tmp/delta-lake-sample; cd /tmp/delta-lake-sample
122-
$ wget https://localstack-assets.s3.amazonaws.com/aws-sample-athena-delta-lake.zip
123-
$ unzip aws-sample-athena-delta-lake.zip; rm aws-sample-athena-delta-lake.zip
124-
{{< / command >}}
121+
```bash
122+
mkdir /tmp/delta-lake-sample; cd /tmp/delta-lake-sample
123+
wget https://localstack-assets.s3.amazonaws.com/aws-sample-athena-delta-lake.zip
124+
unzip aws-sample-athena-delta-lake.zip; rm aws-sample-athena-delta-lake.zip
125+
```
125126

126127
We can then create an S3 bucket in LocalStack using the [`awslocal`](https://github.com/localstack/awscli-local) command line, and upload the files to the bucket:
127-
{{< command >}}
128-
$ awslocal s3 mb s3://test
129-
$ awslocal s3 sync /tmp/delta-lake-sample s3://test
130-
{{< / command >}}
128+
129+
```bash
130+
awslocal s3 mb s3://test
131+
awslocal s3 sync /tmp/delta-lake-sample s3://test
132+
```
131133

132134
Next, we create the table definitions in Athena:
133-
{{< command >}}
134-
$ awslocal athena start-query-execution \
135+
136+
```bash
137+
awslocal athena start-query-execution \
135138
--query-string "CREATE EXTERNAL TABLE test (product_id string, product_name string, \
136139
price bigint, currency string, category string, updated_at double) \
137140
LOCATION 's3://test/' TBLPROPERTIES ('table_type'='DELTA')"
138-
{{< / command >}}
141+
```
139142

140143
Please note that this query may take some time to finish executing.
141144
You can observe the output in the LocalStack container (ideally with `DEBUG=1` enabled) to follow the steps of the query execution.
142145

143146
Finally, we can now run a `SELECT` query to extract data from the Delta Lake table we've just created:
144-
{{< command >}}
145-
$ queryId=$(awslocal athena start-query-execution --query-string "SELECT * from deltalake.default.test" | jq -r .QueryExecutionId)
146-
$ awslocal athena get-query-results --query-execution-id $queryId
147-
{{< / command >}}
147+
148+
```bash
149+
queryId=$(awslocal athena start-query-execution --query-string "SELECT * from deltalake.default.test" | jq -r .QueryExecutionId)
150+
awslocal athena get-query-results --query-execution-id $queryId
151+
```
148152

149153
The query should yield a result similar to the output below:
150154

@@ -175,9 +179,9 @@ The query should yield a result similar to the output below:
175179
...
176180
```
177181
178-
{{< callout >}}
182+
:::note
179183
The `SELECT` statement above currently requires us to prefix the database/table name with `deltalake.` - this will be further improved in a future iteration, for better parity with AWS.
180-
{{< /callout >}}
184+
:::
181185
182186
## Iceberg Tables
183187
@@ -210,8 +214,10 @@ s3://mybucket/prefix/temp/
210214
You can configure the Athena service in LocalStack with various clients, such as [PyAthena](https://github.com/laughingman7743/PyAthena/), [awswrangler](https://github.com/aws/aws-sdk-pandas), among others!
211215
Here are small snippets to get you started:
212216
213-
{{< tabpane lang="python" >}}
214-
{{< tab header="PyAthena" lang="python" >}}
217+
<Tabs>
218+
<TabItem label="PyAthena">
219+
220+
```python
215221
from pyathena import connect
216222

217223
conn = connect(
@@ -223,8 +229,13 @@ cursor = conn.cursor()
223229

224230
cursor.execute("SELECT 1,2,3 AS test")
225231
print(cursor.fetchall())
226-
{{< /tab >}}
227-
{{< tab header="awswrangler" lang="python" >}}
232+
```
233+
234+
</TabItem>
235+
236+
<TabItem label="awswrangler">
237+
238+
```python
228239
import awswrangler as wr
229240
import pandas as pd
230241

@@ -238,15 +249,16 @@ wr.config.s3_endpoint_url = ENDPOINT
238249
wr.catalog.create_database(DATABASE)
239250
df = wr.athena.read_sql_query("SELECT 1 AS col1, 2 AS col2, 3 AS col3", database=DATABASE)
240251
print(df)
241-
{{< /tab >}}
242-
{{< /tabpane >}}
252+
```
253+
</TabItem>
254+
</Tabs>
243255
244256
## Resource Browser
245257
246258
The LocalStack Web Application provides a Resource Browser for Athena query execution, writing SQL queries, and visualizing query results.
247259
You can access the Resource Browser by opening the LocalStack Web Application in your browser, navigating to the **Resources** section, and then clicking on **Athena** under the **Analytics** section.
248260
249-
<img src="athena-resource-browser.png" alt="Athena Resource Browser" title="Athena Resource Browser" width="900" />
261+
![Athena Resource Browser](/images/aws/athena-resource-browser.png)
250262
251263
The Resource Browser allows you to perform the following actions:
252264

0 commit comments

Comments
 (0)