You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Athena is an interactive query service provided by Amazon Web Services (AWS) that enables you to analyze data stored in S3 using standard SQL queries.
11
12
Athena allows users to create ad-hoc queries to perform data analysis, filter, aggregate, and join datasets stored in S3.
12
13
It supports various file formats, such as JSON, Parquet, and CSV, making it compatible with a wide range of data sources.
13
14
14
15
LocalStack allows you to configure the Athena APIs with a Hive metastore that can connect to the S3 API and query your data directly in your local environment.
15
-
The supported APIs are available on our [API coverage page]({{< ref "coverage_athena" >}}), which provides information on the extent of Athena's integration with LocalStack.
16
+
The supported APIs are available on our [API coverage page](), which provides information on the extent of Athena's integration with LocalStack.
16
17
17
18
## Getting started
18
19
@@ -21,44 +22,44 @@ This guide is designed for users new to Athena and assumes basic knowledge of th
21
22
Start your LocalStack container using your preferred method.
22
23
We will demonstrate how to create an Athena table and run a query against it in addition to reading the results with the AWS CLI.
23
24
24
-
{{< callout >}}
25
+
:::note
25
26
To utilize the Athena API, LocalStack will download additional dependencies.
26
27
This involves getting a Docker image of around 1.5GB, containing Presto, Hive, and other tools.
27
28
These components are retrieved automatically when you initiate the service.
28
29
To ensure a smooth initial setup, ensure you're connected to a stable internet connection while fetching these components for the first time.
29
-
{{< /callout >}}
30
+
:::
30
31
31
32
### Create an S3 bucket
32
33
33
34
You can create an S3 bucket using the [`mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) command.
34
35
Run the following command to create a bucket named `athena-bucket`:
35
36
36
-
{{< command >}}
37
-
$ awslocal s3 mb s3://athena-bucket
38
-
{{< / command >}}
37
+
```bash
38
+
awslocal s3 mb s3://athena-bucket
39
+
```
39
40
40
41
You can create some sample data using the following commands:
41
42
42
-
{{< command >}}
43
-
$ echo "Name,Service" > data.csv
44
-
$ echo "LocalStack,Athena" >> data.csv
45
-
{{< / command >}}
43
+
```bash
44
+
echo"Name,Service"> data.csv
45
+
echo"LocalStack,Athena">> data.csv
46
+
```
46
47
47
48
You can upload the data to your bucket using the [`cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) command:
You can create an Athena table using the [`CreateTable`](https://docs.aws.amazon.com/athena/latest/APIReference/API_CreateTable.html) API
56
57
Run the following command to create a table named `athena_table`:
57
58
58
-
{{< command >}}
59
-
$ awslocal athena start-query-execution \
59
+
```bash
60
+
awslocal athena start-query-execution \
60
61
--query-string "create external table tbl01 (name STRING, surname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://athena-bucket/data/';" --result-configuration "OutputLocation=s3://athena-bucket/output/"
61
-
{{< / command >}}
62
+
```
62
63
63
64
The following output would be retrieved:
64
65
@@ -71,9 +72,9 @@ The following output would be retrieved:
71
72
You can retrieve information about the query execution using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API.
Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html) API.
79
80
@@ -82,27 +83,27 @@ Replace `593acab7` with the `QueryExecutionId` returned by the [`StartQueryExecu
82
83
You can get the output of the query using the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API.
You can now read the data from the `tbl01` table and retrieve the data from S3 that was mentioned in your table creation statement.
90
91
Run the following command:
91
92
92
-
{{< command >}}
93
-
$ awslocal athena start-query-execution \
93
+
```bash
94
+
awslocal athena start-query-execution \
94
95
--query-string "select * from tbl01;" --result-configuration "OutputLocation=s3://athena-bucket/output/"
95
-
{{< / command >}}
96
+
```
96
97
97
98
You can retrieve the execution details similarly using the [`GetQueryExecution`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryExecution.html) API using the `QueryExecutionId` returned by the previous step.
98
99
99
100
You can copy the `ResultConfiguration` from the output and use it to retrieve the results of the query.
Replace `593acab7.csv` with the path to the file that was present in the `ResultConfiguration` of the previous step.
108
109
You can also use the [`GetQueryResults`](https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html) API to retrieve the results of the query.
@@ -117,34 +118,37 @@ The Delta Lake files used in this sample are available in a public S3 bucket und
117
118
For your convenience, we have prepared the test files in a downloadable ZIP file [here](https://localstack-assets.s3.amazonaws.com/aws-sample-athena-delta-lake.zip).
118
119
We start by downloading and extracting this ZIP file:
119
120
120
-
{{< command >}}
121
-
$ mkdir /tmp/delta-lake-sample; cd /tmp/delta-lake-sample
We can then create an S3 bucket in LocalStack using the [`awslocal`](https://github.com/localstack/awscli-local) command line, and upload the files to the bucket:
The query should yield a result similar to the output below:
150
154
@@ -175,9 +179,9 @@ The query should yield a result similar to the output below:
175
179
...
176
180
```
177
181
178
-
{{< callout >}}
182
+
:::note
179
183
The `SELECT` statement above currently requires us to prefix the database/table name with `deltalake.` - this will be further improved in a future iteration, for better parity with AWS.
180
-
{{< /callout >}}
184
+
:::
181
185
182
186
## Iceberg Tables
183
187
@@ -210,8 +214,10 @@ s3://mybucket/prefix/temp/
210
214
You can configure the Athena service in LocalStack with various clients, such as [PyAthena](https://github.com/laughingman7743/PyAthena/), [awswrangler](https://github.com/aws/aws-sdk-pandas), among others!
df = wr.athena.read_sql_query("SELECT 1 AS col1, 2 AS col2, 3 AS col3", database=DATABASE)
240
251
print(df)
241
-
{{< /tab >}}
242
-
{{< /tabpane >}}
252
+
```
253
+
</TabItem>
254
+
</Tabs>
243
255
244
256
## Resource Browser
245
257
246
258
The LocalStack Web Application provides a Resource Browser for Athena query execution, writing SQL queries, and visualizing query results.
247
259
You can access the Resource Browser by opening the LocalStack Web Application in your browser, navigating to the **Resources** section, and then clicking on **Athena** under the **Analytics** section.
0 commit comments