Skip to content

Conversation

@prashantgolash
Copy link
Contributor

@prashantgolash prashantgolash commented Jan 29, 2026

Summary: Session property to control the file size for presto writers

Differential Revision: D91361183

Summary by Sourcery

New Features:

  • Introduce a NATIVE_MAX_TARGET_FILE_SIZE session property to control when writers roll over to a new output file based on size.

Release Notes

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.

@prashantgolash prashantgolash requested a review from a team as a code owner January 29, 2026 18:49
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jan 29, 2026
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 29, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds a new Hive session property to control the maximum target file size for native Presto writers, allowing writers to roll over to a new file when the size limit is reached.

Class diagram for new Hive session property native_max_target_file_size

classDiagram
    class HiveSessionProperties {
        <<final>>
        +static String NATIVE_STATS_BASED_FILTER_REORDER_DISABLED
        +static String NATIVE_MAX_TARGET_FILE_SIZE
        -List sessionProperties
        +HiveSessionProperties(HiveClientConfig hiveClientConfig, OrcFileWriterConfig orcFileWriterConfig, ParquetFileWriterConfig parquetFileWriterConfig, RcFileWriterConfig rcFileWriterConfig)
        +List getSessionProperties()
    }

    class PropertyMetadata {
        +String name
        +String description
        +Object defaultValue
        +boolean hidden
    }

    HiveSessionProperties "1" --> "*" PropertyMetadata : defines
Loading

Flow diagram for writer behavior with native_max_target_file_size session property

flowchart TD
    A[Start write operation] --> B[Initialize writer and open first file]
    B --> C[Fetch session property native_max_target_file_size]
    C --> D[Write data page to current file]
    D --> E[Update current file size]
    E --> F{Is native_max_target_file_size > 0?}
    F -- No --> D
    F -- Yes --> G{Has current file size exceeded native_max_target_file_size?}
    G -- No --> D
    G -- Yes --> H[Close current file]
    H --> I[Open new file]
    I --> D
Loading

File-Level Changes

Change Details Files
Introduce a configurable session property for maximum target file size in native writers.
  • Add NATIVE_MAX_TARGET_FILE_SIZE constant to Hive session properties
  • Register a new dataSizeSessionProperty with default value 0 bytes to represent no file size limit
  • Document the property semantics: when a file exceeds this size, the writer closes the current file and starts a new one
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider validating that the configured NATIVE_MAX_TARGET_FILE_SIZE is non-negative at session/property parsing time so that unexpected negative values are rejected early rather than interpreted as zero or causing downstream issues.
  • It may be useful to reference or align this session property’s default with any existing configuration for target file size (if one exists) to avoid divergent defaults between session-level and config-level behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider validating that the configured `NATIVE_MAX_TARGET_FILE_SIZE` is non-negative at session/property parsing time so that unexpected negative values are rejected early rather than interpreted as zero or causing downstream issues.
- It may be useful to reference or align this session property’s default with any existing configuration for target file size (if one exists) to avoid divergent defaults between session-level and config-level behavior.

## Individual Comments

### Comment 1
<location> `presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java:142` </location>
<code_context>

     public static final String NATIVE_STATS_BASED_FILTER_REORDER_DISABLED = "native_stats_based_filter_reorder_disabled";

+    public static final String NATIVE_MAX_TARGET_FILE_SIZE = "native_max_target_file_size";
+
     private final List<PropertyMetadata<?>> sessionProperties;
</code_context>

<issue_to_address>
**suggestion:** Consider making it explicit in the description that this property is specific to native execution, to align with the naming convention.

The name is prefixed with `NATIVE_`, but the user-facing description does not mention native execution while the adjacent property’s description does. If this setting is only used by native execution, please state that explicitly in the description string to avoid user confusion when configuring it.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 29, 2026
…db#27054)

Summary:

Session property to control the file size for presto writers

Differential Revision: D91361183
@prashantgolash prashantgolash changed the title [OSS] Session property to control file size for presto writer feat: Session property to control file size for presto writer Jan 29, 2026
* Add support for custom schemas in native sidecar function registry. `#26236 <https://github.com/prestodb/presto/pull/26236>`_
* Add support for the TPC-DS connector. `#24751 <https://github.com/prestodb/presto/pull/24751>`_
* Add support for REST API for remote functions. `#23568 <https://github.com/prestodb/presto/pull/23568>`_
* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.

The session property added in this new PR was not part of the 0.296 release in December 2025. Please add documentation for the new session property to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.

Use the text you wrote here for the release note entry in the PR and it will be picked up by the automation for the release notes documentation of the Presto release this PR will be part of after it is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have added the release notes. Let me know if this is enough. Also I removed the wrong file that I added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I have added the release notes. Let me know if this is enough. Also I removed the wrong file that I added

Thanks! Please use this text for the release note. Put a row of three ` above and below this so that when you save your change, the text displays in a gray box like this does.

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes

* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you can, please add documentation for the new session property to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.

Copy link
Contributor Author

@prashantgolash prashantgolash Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveburnett done on release notes. I think the property file you mentioned is for system session properties only. I think we don't have any connector specific properties file? cc @spershin as well

Copy link
Contributor

@steveburnett steveburnett Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in Designing Your Code in CONTRIBUTING.md, a PR should:

"All new language features, new functions, session and config properties, and major features have documentation added"

There are many Presto configuration and session properties that are not documented. I'm trying to address the existing gaps and in the meantime I'm asking that we not enlarge the technical debt of missing documentation.

I didn't realize this new session property is Hive-specific, and there are several Hive session properties mentioned throughout https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst. I see two possibilities:

  • Add a new heading to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst for the topic the new session property is about, maybe something like "Controlling File Size", and below that write a paragraph similar to the release note entry that names the session property. Mention if it has a default and if so what it is, what type of property it is, and so on. It doesn't have to be long, the idea is to have the session property in the Presto documentation for a user to find without having to read the source code.

  • After the heading and table "Hive Configuration Properties" create a new heading "Hive Session Properties" and below it add a table with one row, using the same format as the table in Hive Configuration Properties. This would give a location for other Hive session properties to be added to the documentation in future, by someone else.

I'm okay with either of these, and I'm open to your ideas for a third solution. Let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveburnett , @prashantgolash
I like the second option, where we start a section in hive.rst with session props table for props not mentioned in other tables in this doc.
We can put a new session prop into this table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like 2nd option. Let me work on this and update this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @prashantgolash! This gives people a place to add documentation for the other Hive session properties, and I appreciate you creating a landing place for that content that we did not have before.

prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 29, 2026
…db#27054)

Summary:

Session property to control the file size for presto writers

Differential Revision: D91361183
prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 29, 2026
…db#27054)

Summary:

Session property to control the file size for presto writers

Differential Revision: D91361183
…db#27054)

Summary:

Session property to control the file size for presto writers

Reviewed By: spershin

Differential Revision: D91361183
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, everything looks good.

Thank you for the documentation!

@tdcmeehan tdcmeehan changed the title feat: Session property to control file size for presto writer feat(plugin-hive): Session property to control file size for presto writer Jan 30, 2026
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to introduce a hive catalog property mapped to this session property, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants