-
Notifications
You must be signed in to change notification settings - Fork 749
YQ-5013 fixed S3 async decompression #31655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YQ-5013 fixed S3 async decompression #31655
Conversation
|
🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical S3 async decompression bug (YQ-5013) along with extensive code style improvements. The main fixes address deadlock conditions in concurrent directory listing and improper decompressor lifecycle management that could cause data loss or hangs during async decompression.
Key changes:
- Fixed deadlock in concurrent BFS directory listing when futures resolve synchronously
- Corrected async decompressor shutdown logic to properly handle remaining data buffers
- Added validation to prevent unsupported file:// protocol usage with async decompression and certain formats
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ydb/library/yql/providers/s3/actors/yql_s3_read_actor.cpp | Critical async decompression fixes: proper decompressor finishing, buffer clearing, and event processing prioritization |
| ydb/library/yql/providers/s3/provider/yql_s3_listing_strategy.cpp | Fixed deadlock when listing futures resolve synchronously; extensive formatting improvements |
| ydb/library/yql/providers/s3/provider/yql_s3_datasource_type_ann.cpp | Added validation to prevent file:// reading with incompatible pragmas (async decompression, raw/json_list formats) |
| ydb/core/kqp/ut/federated_query/s3/kqp_federated_query_ut.cpp | Comprehensive test coverage for file validation and async decompression error handling |
| ydb/library/yql/providers/s3/actors/yql_s3_raw_read_actor.cpp | Added file:// protocol check to prevent unsupported usage in raw read actor |
| ydb/library/yql/providers/s3/actors/yql_s3_write_actor.cpp | Improved error handling for type parsing with proper error stream capture |
| ydb/library/yql/providers/s3/provider/yql_s3_dq_integration.cpp | Style improvements and minor logic adjustments for path processing |
| ydb/library/yql/providers/s3/actors/yql_s3_decompressor_actor.cpp | Code style improvements and const correctness |
| ydb/library/yql/providers/s3/events/events.h | Constructor formatting and move semantics improvements |
| ydb/library/yql/providers/s3/actors_factory/yql_s3_actors_factory.cpp | Fixed variable naming from "s3ReadActoryConfig" to "s3ReadFactoryConfig" |
| ydb/library/yql/providers/s3/object_listers/yql_s3_list.cpp | Improved error message clarity for delimiter usage with local files |
| ydb/library/yql/providers/pq/async_io/dq_pq_rd_read_actor.cpp | Enhanced error reporting for type parsing failures |
| ydb/library/yql/providers/s3/credentials/credentials.cpp | Fixed spacing in debug output strings |
| ydb/library/yql/providers/s3/compressors/lz4io.cpp | Added missing braces for if statement |
| ydb/library/yql/providers/s3/common/source_context.cpp | Removed unnecessary parentheses in condition |
| ydb/library/yql/providers/s3/common/source_context.h | Constructor formatting cleanup |
| ydb/library/yql/providers/s3/range_helpers/path_list_reader.h | Constructor formatting and spacing improvements |
| ydb/tests/tools/kqprun/runlib/actors.h | Added missing Endl to error output for proper line termination |
| ydb/core/kqp/ut/federated_query/common/common.cpp | Enabled local file support in test configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ydb/library/yql/providers/s3/provider/yql_s3_listing_strategy.cpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Changelog entry
Fixed S3 async decompression, bug ticket: https://st.yandex-team.ru/YQ-5013
Changelog category
Description for reviewers
Main fixes:
https://github.com/ydb-platform/ydb/pull/31655/changes#diff-ec0dce4b9d614a34fc221710d42054a532a2d9c349b46a7d541c36d8c885ee26R890-R892
https://github.com/ydb-platform/ydb/pull/31655/changes#diff-ec0dce4b9d614a34fc221710d42054a532a2d9c349b46a7d541c36d8c885ee26L1022-R1033
https://github.com/ydb-platform/ydb/pull/31655/changes#diff-ec0dce4b9d614a34fc221710d42054a532a2d9c349b46a7d541c36d8c885ee26L917-R924
Other fixes:
file://url schemafile://schema used wih S3 raw read actor or with async decompression