Skip to content

[🐛 Bug]: serverErrorPolicy does retry on Exception / Error #14917

@joerg1985

Description

@joerg1985

What happened?

The serverErrorPolicy inside the org.openqa.selenium.remote.http.RetryRequest does not only handle the server errors.
It basically handles all Exceptions / Errors, i guess this is due tue a missunderstanding of the API of the library used here.
This increases the retries on ConnectException to 5, instead of only 3.

In general i think the use of this library is problematic, due to:

  • A miss match between the naming of method / things inside the javadoc and what is actually happening (this is also the reason for this issue...).
    e.g. ExecutionAttemptedEvent.getLastException does return a Throwable, so we have a potential class cast exception in
    Exception exception = (Exception) executionAttemptedEvent.getLastException();

    e.g. the javadoc of RetryPolicy.handleIf speaks about "exception" but meant is "throwable"
  • It implies handling failues is super easy, just put it inside a retry and don't think about it.
    This will probably lead to leaks, e.g. leaking sub in case pub failes in
    Failsafe.with(retryPolicy)
    .run(
    () -> {
    sub = context.createSocket(SocketType.SUB);
    sub.setIPv6(isSubAddressIPv6(publishConnection));
    sub.connect(publishConnection);
    sub.subscribe(new byte[0]);
    pub = context.createSocket(SocketType.PUB);
    pub.setIPv6(isSubAddressIPv6(subscribeConnection));
    pub.connect(subscribeConnection);
    });
  • I have debugged into to the code and it makes things complex compared to a simple loop. And complexity does bring bugs with it e.g. before PR [java] Ensure retry mechanism does not swallow an exception #12838 the responses of concurrent failed request could get mixed.
  • No response to most issues raised in the issue tracker, including one related to selenium java.lang.NoClassDefFoundError: dev/failsafe/Policy in java modul failsafe-lib/failsafe#386

So i would like to ask: Glue some patches on the code and hope the code does what it should or remove the library with a custom implementation aka. a simple loop?

How can we reproduce the issue?

Run `RetryRequestTest.canThrowUnexpectedException` in debug mode to see the unexpected retries in the logs.

Or run the code below to see the 5 retries instead of 3.
HttpHandler handler =
        new RetryRequest()
            .andFinally(
                (HttpRequest request) -> {
                  throw new WebDriverException("oops", new ConnectException("Testing"));
                });

Relevant log output

Dez. 19, 2024 11:50:53 AM org.openqa.selenium.remote.http.RetryRequest lambda$static$4
INFORMATION: Failure due to server error #1. Retrying.
Dez. 19, 2024 11:50:56 AM org.openqa.selenium.remote.http.RetryRequest lambda$static$4
INFORMATION: Failure due to server error #2. Retrying.

Operating System

Win 10 x64

Selenium version

4.27.0

What are the browser(s) and version(s) where you see this issue?

N/A

What are the browser driver(s) and version(s) where you see this issue?

N/A

Are you using Selenium Grid?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-gridEverything grid and server relatedC-javaJava BindingsI-defectSomething is not working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions