|
8 | 8 | "\n", |
9 | 9 | "# 19 - Amazon Athena Cache\n", |
10 | 10 | "\n", |
11 | | - "[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled passing `max_cache_seconds` biggier than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n", |
| 11 | + "[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled by passing `max_cache_seconds` bigger than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n", |
12 | 12 | "\n", |
13 | 13 | "When calling `read_sql_query`, instead of just running the query, we now can verify if the query has been run before. If so, and this last run was within `max_cache_seconds` (a new parameter to `read_sql_query`), we return the same results as last time if they are still available in S3. We have seen this increase performance more than 100x, but the potential is pretty much infinite.\n", |
14 | 14 | "\n", |
15 | 15 | "The detailed approach is:\n", |
16 | 16 | "- When `read_sql_query` is called with `max_cache_seconds > 0` (it defaults to 0), we check for the last queries run by the same workgroup (the most we can get without pagination).\n", |
17 | | - "- By default it will check the last 50 queries, but you can customize it throught the `max_cache_query_inspections` argument.\n", |
| 17 | + "- By default it will check the last 50 queries, but you can customize it through the `max_cache_query_inspections` argument.\n", |
18 | 18 | "- We then sort those queries based on CompletionDateTime, descending\n", |
19 | 19 | "- For each of those queries, we check if their CompletionDateTime is still within the `max_cache_seconds` window. If so, we check if the query string is the same as now (with some smart heuristics to guarantee coverage over both `ctas_approach`es). If they are the same, we check if the last one's results are still on S3, and then return them instead of re-running the query.\n", |
20 | 20 | "- During the whole cache resolution phase, if there is anything wrong, the logic falls back to the usual `read_sql_query` path.\n", |
|
292 | 292 | " mode=\"overwrite\",\n", |
293 | 293 | " database=\"awswrangler_test\",\n", |
294 | 294 | " table=\"noaa\"\n", |
295 | | - ");" |
| 295 | + ")" |
296 | 296 | ] |
297 | 297 | }, |
298 | 298 | { |
|
0 commit comments