Skip to content

Fix flaky Netty4Http3IT test suite#20847

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
reta:issue-20654
Mar 12, 2026
Merged

Fix flaky Netty4Http3IT test suite#20847
andrross merged 1 commit intoopensearch-project:mainfrom
reta:issue-20654

Conversation

@reta
Copy link
Contributor

@reta reta commented Mar 11, 2026

Description

Fixing flaky Netty4Http3IT test suite

Related Issues

Closes #20654

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@reta reta requested a review from peternied as a code owner March 11, 2026 22:51
@reta reta added the flaky-test Random test failure that succeeds on second run label Mar 11, 2026
@reta reta requested a review from a team as a code owner March 11, 2026 22:51
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut labels Mar 11, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

PR Reviewer Guide 🔍

(Review updated until commit 82632f8)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Timeout Mismatch

The maxIdleTimeout is set to 60 seconds while CONNECT_TIMEOUT_MILLIS is set to 30 seconds. The connect timeout is half the idle timeout, which could cause confusion. If a connection takes longer than 30 seconds to establish but less than 60 seconds, it will fail at connect time rather than idle timeout. Verify that these two timeout values are intentionally different and correctly reflect the desired behavior.

        .maxIdleTimeout(60, TimeUnit.SECONDS)
        .initialMaxData(SETTING_HTTP_MAX_CONTENT_LENGTH.get(Settings.EMPTY).getBytes())
        .initialMaxStreamDataBidirectionalLocal(SETTING_H3_MAX_STREAM_LOCAL_LENGTH.get(Settings.EMPTY).getBytes())
        .initialMaxStreamDataBidirectionalRemote(SETTING_H3_MAX_STREAM_REMOTE_LENGTH.get(Settings.EMPTY).getBytes())
        .initialMaxStreamsBidirectional(SETTING_H3_MAX_STREAMS.get(Settings.EMPTY).longValue())
        .build();

    ch.pipeline().addLast(quicClientCodec);
}

@Override
Channel prepare(Bootstrap clientBootstrap, Channel channel) throws InterruptedException {
    final QuicChannel quicChannel = QuicChannel.newBootstrap(channel)
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000) // 30 seconds

@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

PR Code Suggestions ✨

Latest suggestions up to 82632f8
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Align idle and connect timeout values

The maxIdleTimeout of 60 seconds on the QUIC codec and the CONNECT_TIMEOUT_MILLIS of
30 seconds are inconsistent — the connect timeout is half the idle timeout. If a
connection attempt takes longer than 30 seconds but less than 60 seconds, the idle
timeout may fire before the connect timeout, leading to confusing failures. Consider
aligning both values or ensuring the idle timeout is strictly greater than the
connect timeout with a clear margin.

modules/transport-netty4/src/test/java/org/opensearch/http/netty4/Netty4HttpClient.java [509]

-.maxIdleTimeout(60, TimeUnit.SECONDS)
+.maxIdleTimeout(90, TimeUnit.SECONDS)
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about the relationship between maxIdleTimeout (60s) and CONNECT_TIMEOUT_MILLIS (30s), but these are fundamentally different concepts — idle timeout applies to an established connection's inactivity, while connect timeout limits the connection establishment phase. They don't need to be aligned, and the proposed change to 90 seconds is arbitrary without clear justification.

Low

Previous suggestions

Suggestions up to commit e954473
CategorySuggestion                                                                                                                                    Impact
General
Align idle timeout with connect timeout value

A 60-second max idle timeout on the QUIC codec may be too long for a test client and
could cause tests to hang if connections are not properly closed. Consider using a
shorter timeout (e.g., 10-15 seconds) consistent with the connect timeout to keep
test execution time bounded.

modules/transport-netty4/src/test/java/org/opensearch/http/netty4/Netty4HttpClient.java [509]

-.maxIdleTimeout(60, TimeUnit.SECONDS)
+.maxIdleTimeout(10, TimeUnit.SECONDS)
Suggestion importance[1-10]: 3

__

Why: While the suggestion to reduce the idle timeout from 60 to 10 seconds is reasonable for test consistency, the 60-second timeout is a common default and not necessarily problematic. The connect timeout and idle timeout serve different purposes, so aligning them isn't strictly necessary.

Low

Signed-off-by: Andriy Redko <drreta@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 82632f8

@github-actions
Copy link
Contributor

❌ Gradle check result for 82632f8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 82632f8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 82632f8: SUCCESS

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.24%. Comparing base (db61a88) to head (82632f8).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20847      +/-   ##
============================================
- Coverage     73.30%   73.24%   -0.07%     
+ Complexity    72252    72154      -98     
============================================
  Files          5795     5795              
  Lines        330056   330056              
  Branches      47643    47643              
============================================
- Hits         241947   241736     -211     
- Misses        68695    68946     +251     
+ Partials      19414    19374      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross andrross merged commit 9ffedcf into opensearch-project:main Mar 12, 2026
41 of 45 checks passed
@reta reta added the backport 3.5 Backport to 3.5 branch label Mar 12, 2026
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 12, 2026
(cherry picked from commit 9ffedcf)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request Mar 12, 2026
(cherry picked from commit 9ffedcf)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut backport 3.5 Backport to 3.5 branch flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for Netty4Http3IT

2 participants