Skip to content

Conversation

@shunping
Copy link
Collaborator

@shunping shunping commented Oct 29, 2025

  • Use the same output format in streaming as in batch
    • Unkeyed output element is ((window_start, window_end), [(Timestamp, element) ...])
    • The expected test results are changed accordingly based on that
  • Re-calculate the start sliding boundary so it won't miss some windows.
  • When we fill the beginning of the window with the last value, also replace the timestamp value in the tuple.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shunping, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to standardize the behavior and output format of windowed elements within the OrderedWindowElements transform, ensuring consistency between streaming and batch processing. The changes involve a more precise calculation of sliding window starts and a richer output structure that explicitly includes timestamps for all elements within a window. This enhancement improves the predictability and utility of the transform's output.

Highlights

  • Refined Sliding Window Calculation: The logic for calculating the start of sliding windows has been updated for more accurate and consistent behavior, introducing intermediate variables for clarity and precision.
  • Explicit Timestamping of Windowed Elements: Elements within windows now explicitly include their timestamps in the output, ensuring a more complete representation of data within windows, especially when filling empty windows or gaps.
  • Standardized Output Format: The output structure of OrderedWindowElements has been modified to consistently include the key, window boundaries, and timestamped elements, improving interoperability and clarity across different processing modes.
  • Test Case Updates: Existing test cases have been commented out to reflect the significant change in output format, and a new helper function _convert_timestamp_to_int was added to facilitate easier assertion comparisons for the updated test outputs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@shunping shunping force-pushed the ordered-batch-elements-4 branch from 911cf70 to 51ae038 Compare October 29, 2025 04:14
@shunping shunping force-pushed the ordered-batch-elements-4 branch from 51ae038 to c372dbe Compare October 29, 2025 04:23
@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@shunping
Copy link
Collaborator Author

R: @liferoad

@shunping shunping self-assigned this Oct 29, 2025
@shunping shunping requested a review from liferoad October 29, 2025 14:03
@shunping
Copy link
Collaborator Author

/gemini review

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request brings several good improvements to OrderedWindowElements. It aligns the streaming output format with the batch version, fixes an issue with calculating the start of sliding windows to prevent missing some windows, and correctly updates timestamps when filling gaps at the beginning of windows. The changes look solid and the tests have been updated accordingly.

I've added two comments:

  1. A suggestion to refactor the timestamp calculation logic in streaming.py to use Timestamp and Duration objects for better precision and clarity.
  2. A critical fix for test_slide_interval in streaming_test.py, which is missing a necessary transform to convert Timestamp objects to integers before assertion, causing the test to fail.

duration: DurationTypes,
slide_interval: DurationTypes,
offset: TimestampTypes,
offset: DurationTypes,
Copy link
Contributor

@liferoad liferoad Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a break change to me. Could we keep both like offset: Union[TimestampTypes, DurationTypes]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transform is part of an example, so it won't matter if it is a breaking change.

This is to fix the offset type, as it should be a Duration.

@shunping shunping merged commit 190f21e into apache:master Oct 29, 2025
121 of 123 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants