-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-hive): Session property to control file size for presto writer #27054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds a new Hive session property to control the maximum target file size for native Presto writers, allowing writers to roll over to a new file when the size limit is reached. Class diagram for new Hive session property native_max_target_file_sizeclassDiagram
class HiveSessionProperties {
<<final>>
+static String NATIVE_STATS_BASED_FILTER_REORDER_DISABLED
+static String NATIVE_MAX_TARGET_FILE_SIZE
-List sessionProperties
+HiveSessionProperties(HiveClientConfig hiveClientConfig, OrcFileWriterConfig orcFileWriterConfig, ParquetFileWriterConfig parquetFileWriterConfig, RcFileWriterConfig rcFileWriterConfig)
+List getSessionProperties()
}
class PropertyMetadata {
+String name
+String description
+Object defaultValue
+boolean hidden
}
HiveSessionProperties "1" --> "*" PropertyMetadata : defines
Flow diagram for writer behavior with native_max_target_file_size session propertyflowchart TD
A[Start write operation] --> B[Initialize writer and open first file]
B --> C[Fetch session property native_max_target_file_size]
C --> D[Write data page to current file]
D --> E[Update current file size]
E --> F{Is native_max_target_file_size > 0?}
F -- No --> D
F -- Yes --> G{Has current file size exceeded native_max_target_file_size?}
G -- No --> D
G -- Yes --> H[Close current file]
H --> I[Open new file]
I --> D
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- Consider validating that the configured
NATIVE_MAX_TARGET_FILE_SIZEis non-negative at session/property parsing time so that unexpected negative values are rejected early rather than interpreted as zero or causing downstream issues. - It may be useful to reference or align this session property’s default with any existing configuration for target file size (if one exists) to avoid divergent defaults between session-level and config-level behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider validating that the configured `NATIVE_MAX_TARGET_FILE_SIZE` is non-negative at session/property parsing time so that unexpected negative values are rejected early rather than interpreted as zero or causing downstream issues.
- It may be useful to reference or align this session property’s default with any existing configuration for target file size (if one exists) to avoid divergent defaults between session-level and config-level behavior.
## Individual Comments
### Comment 1
<location> `presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java:142` </location>
<code_context>
public static final String NATIVE_STATS_BASED_FILTER_REORDER_DISABLED = "native_stats_based_filter_reorder_disabled";
+ public static final String NATIVE_MAX_TARGET_FILE_SIZE = "native_max_target_file_size";
+
private final List<PropertyMetadata<?>> sessionProperties;
</code_context>
<issue_to_address>
**suggestion:** Consider making it explicit in the description that this property is specific to native execution, to align with the naming convention.
The name is prefixed with `NATIVE_`, but the user-facing description does not mention native execution while the adjacent property’s description does. If this setting is only used by native execution, please state that explicitly in the description string to avoid user confusion when configuring it.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java
Show resolved
Hide resolved
…db#27054) Summary: Session property to control the file size for presto writers Differential Revision: D91361183
2f15a5b to
bb7784e
Compare
| * Add support for custom schemas in native sidecar function registry. `#26236 <https://github.com/prestodb/presto/pull/26236>`_ | ||
| * Add support for the TPC-DS connector. `#24751 <https://github.com/prestodb/presto/pull/24751>`_ | ||
| * Add support for REST API for remote functions. `#23568 <https://github.com/prestodb/presto/pull/23568>`_ | ||
| * Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file. |
The session property added in this new PR was not part of the 0.296 release in December 2025. Please add documentation for the new session property to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.
Use the text you wrote here for the release note entry in the PR and it will be picked up by the automation for the release notes documentation of the Presto release this PR will be part of after it is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I have added the release notes. Let me know if this is enough. Also I removed the wrong file that I added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I have added the release notes. Let me know if this is enough. Also I removed the wrong file that I added
Thanks! Please use this text for the release note. Put a row of three ` above and below this so that when you save your change, the text displays in a gray box like this does.
== RELEASE NOTES ==
Prestissimo (Native Execution) Changes
* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you can, please add documentation for the new session property to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveburnett done on release notes. I think the property file you mentioned is for system session properties only. I think we don't have any connector specific properties file? cc @spershin as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in Designing Your Code in CONTRIBUTING.md, a PR should:
"All new language features, new functions, session and config properties, and major features have documentation added"
There are many Presto configuration and session properties that are not documented. I'm trying to address the existing gaps and in the meantime I'm asking that we not enlarge the technical debt of missing documentation.
I didn't realize this new session property is Hive-specific, and there are several Hive session properties mentioned throughout https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst. I see two possibilities:
-
Add a new heading to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst for the topic the new session property is about, maybe something like "Controlling File Size", and below that write a paragraph similar to the release note entry that names the session property. Mention if it has a default and if so what it is, what type of property it is, and so on. It doesn't have to be long, the idea is to have the session property in the Presto documentation for a user to find without having to read the source code.
-
After the heading and table "Hive Configuration Properties" create a new heading "Hive Session Properties" and below it add a table with one row, using the same format as the table in Hive Configuration Properties. This would give a location for other Hive session properties to be added to the documentation in future, by someone else.
I'm okay with either of these, and I'm open to your ideas for a third solution. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveburnett , @prashantgolash
I like the second option, where we start a section in hive.rst with session props table for props not mentioned in other tables in this doc.
We can put a new session prop into this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like 2nd option. Let me work on this and update this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @prashantgolash! This gives people a place to add documentation for the other Hive session properties, and I appreciate you creating a landing place for that content that we did not have before.
…db#27054) Summary: Session property to control the file size for presto writers Differential Revision: D91361183
bb7784e to
b2d441f
Compare
…db#27054) Summary: Session property to control the file size for presto writers Differential Revision: D91361183
b2d441f to
504bc7c
Compare
…db#27054) Summary: Session property to control the file size for presto writers Reviewed By: spershin Differential Revision: D91361183
504bc7c to
5588f47
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, everything looks good.
Thank you for the documentation!
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prashantgolash
imjalpreet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to introduce a hive catalog property mapped to this session property, too?
Summary: Session property to control the file size for presto writers
Differential Revision: D91361183
Summary by Sourcery
New Features:
Release Notes