feat(athena): add start_query_executions for parallel query execution #3190

ggiallo28 · 2025-08-28T22:59:38Z

Feature or Bugfix

Feature

Detail

Added start_query_executions to submit multiple Athena queries in one call.
Enabled parallel query submission and wait, significantly reducing end-to-end execution time.
Introduced configurable concurrency to adapt performance to available system resources.

Relates

Improves efficiency and responsiveness for workflows requiring multiple Athena queries.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Introduce `wr.athena.start_query_executions` as a parallelized variant of `start_query_execution`. It allows submitting multiple queries in one call, with support for: - Sequential or threaded submission (`use_threads`) - Lazy or eager consumption of results (`as_iterator`) - Per-query `client_request_token` (string or list) - Optional workgroup checks (`check_workgroup`, `enforce_workgroup`) - Full Athena cache integration This improves performance when dispatching batches of queries by reducing workgroup lookups and enabling concurrent execution.

…nd parallel wait - Simplified client_request_token handling: - Removed manual padding/truncation. - Let Athena enforce length constraints. - Tokens generated as `<base_token>-<index>` or provided as list. - Improved wait logic: - Added optional wait handling directly inside _submit. - Queries can now be waited in parallel with submission (reduced overhead). - Configurable default threads: - Replaced hardcoded defaults with os.cpu_count(). - Added support for AWSWRANGLER_THREADS_DEFAULT env var override.

- Removed unused `reduce` import from Athena module. - Applied ruff formatting to `start_query_executions`. - Fixed static check issues to pass CI. - Added ruff check on Athena tests file.

kukushking

Hi, thank you for opening this PR!

What is the reason to build this into the SDK? Is this a common use case to submit x queries in parallel? Feels like it is specific to your concrete application logic

ggiallo28 · 2025-09-03T17:33:38Z

Hi @kukushking, thanks for the review!

I see that this pattern shows up in data workflows where many short Athena queries must run together. A few common examples:

Dashboard refresh/precompute: populate multiple queries that feed BI tiles concurrently. Running them one by one slows down the process and forces repeated checks for each query even when I make small changes in the queries, like checking the workgroup every time, and so on.
Data quality checks: run the same validation across dozens of tables/prefixes in parallel.

API symmetry & ergonomics:
Wrangler already provides athena.get_query_executions for fetching many query details in parallel. start_query_executions is the natural counterpart for submitting many queries at once.

Parallel submission and coordinated wait help improve performance while respecting quotas via a configurable concurrency. The implementation is opt-in, non-breaking, and uses the same guardrails already present for single-query flows.

Why I contributed this:
I took inspiration from awswrangler.athena.get_query_executions(...), found myself repeatedly re-implementing the batch submit and wait pattern for data use cases, and decided to contribute a reusable, documented version back to the community.

Happy to adjust naming, docs, or move it behind a helper/recipe if you prefer, but I believe the symmetry and the prevalence of these data workflows justify having this in the SDK. In the future, it would be great to support retrieving results in the same parallel manner.

jaidisido · 2025-09-05T23:29:27Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: b8a607f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

…e docstring

jaidisido · 2025-09-05T23:45:59Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 89449cb
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-05T23:53:07Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: f20d694
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-06T00:29:56Z

AWS CodeBuild CI Report

CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
Commit ID: b8a607f
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-06T00:49:11Z

AWS CodeBuild CI Report

CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
Commit ID: 89449cb
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-06T00:54:22Z

AWS CodeBuild CI Report

CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
Commit ID: f20d694
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-09T09:48:56Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 281f5c4
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2025-09-09T10:48:58Z

AWS CodeBuild CI Report

CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
Commit ID: 281f5c4
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

kukushking · 2025-10-21T17:20:22Z

Thanks for the comment @ggiallo28 .

Dashboard refresh/precompute: populate multiple queries that feed BI tiles concurrently. Running them one by one slows down the process and forces repeated checks for each query even when I make small changes in the queries, like checking the workgroup every time, and so on.

Data quality checks: run the same validation across dozens of tables/prefixes in parallel.

I agree running queries one by one slows down the process, however that does not explain why this should be handled in the SDK, and not your BI application logic. I am a bit cautious to add this to SDK as the same reasoning may be extended to any other call the library provides, adding overhead and negatively impacting maintainability.

API symmetry & ergonomics: Wrangler already provides athena.get_query_executions for fetching many query details in parallel. start_query_executions is the natural counterpart for submitting many queries at once.

athena.get_query_executions is a batch API, it uses BatchGetQueryExecution under the hood. We maintain API symmetry between SDK and AWS API.

ggiallo28 added 3 commits August 28, 2025 23:49

chore: cleanup and CI adjustments

5e1c7f0

- Removed unused `reduce` import from Athena module. - Applied ruff formatting to `start_query_executions`. - Fixed static check issues to pass CI. - Added ruff check on Athena tests file.

kukushking requested changes Sep 3, 2025

View reviewed changes

Merge branch 'main' into feat/athena-start-query-executions

b8a607f

ggiallo28 added 2 commits September 6, 2025 01:32

feat(athena): support named & qmark parameters; use generators; updat…

89449cb

…e docstring

chore(athena): ruff/black style cleanups in _executions.py

f20d694

Merge branch 'main' into feat/athena-start-query-executions

281f5c4

This comment was marked as duplicate.

Sign in to view

kukushking closed this Oct 21, 2025

feat(athena): add start_query_executions for parallel query execution #3190

feat(athena): add start_query_executions for parallel query execution #3190

Uh oh!

Conversation

ggiallo28 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature or Bugfix

Detail

Relates

Uh oh!

kukushking left a comment

Choose a reason for hiding this comment

Uh oh!

ggiallo28 commented Sep 3, 2025

Uh oh!

jaidisido commented Sep 5, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 5, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 5, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 6, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 6, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 6, 2025

AWS CodeBuild CI Report

Uh oh!

This comment was marked as duplicate.

Uh oh!

jaidisido commented Sep 9, 2025

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Sep 9, 2025

AWS CodeBuild CI Report

Uh oh!

kukushking commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggiallo28 commented Aug 28, 2025 •

edited

Loading