Skip to content

Conversation

aliehsaeedii
Copy link
Contributor

@aliehsaeedii aliehsaeedii commented Aug 7, 2025

With a 1ms timeout, the streams group description occasionally
succeeds. To check if this resolves the flakiness, I created the streams
group within the test itself. If this approach doesn't stabilize the
test, we should consider removing it altogether, since the original
intent behind the test may no longer be valid.

Reviewers: Bill Bejeck[email protected]

@github-actions github-actions bot added triage PRs from the community tools tests Test fixes (including flaky tests) small Small PRs labels Aug 7, 2025
Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @aliehsaeedii - overall LGTM - I'll kick off the build a few times to see if the flakiness is fixed, and either merge or discuss if we need to keep the test.

@mjsax mjsax added ci-approved streams and removed triage PRs from the community labels Aug 11, 2025
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the original purpose of the test? -- It's unclear to me, why creating a second KS instance help to stabilize the test. Can you elaborate?

If I read the test correctly, the idea is to get a timeout exception for a describe call -- what is the conditions when we would get a timeout exception? Group does not exist? Or something else?

List<String> args = List.of("--bootstrap-server", bootstrapServers, "--describe", "--members", "--verbose", "--group", APP_ID, "--timeout", "1");
Throwable e = assertThrows(ExecutionException.class, () -> getStreamsGroupService(args.toArray(new String[0])).describeGroups());
assertEquals(TimeoutException.class, e.getCause().getClass());

streams2.close();
streams2.cleanUp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use try-with-resource, we can remove close() and put cleanUp() into finally-block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finally block does not know the variable defined in the try clause

@aliehsaeedii
Copy link
Contributor Author

@mjsax

What was the original purpose of the test? -- It's unclear to me, why creating a second KS instance help to stabilize the test. Can you elaborate?
If I read the test correctly, the idea is to get a timeout exception for a describe call -- what is the conditions when we would get a timeout exception? Group does not exist? Or something else?

I assumed in real-world scenarios, 1ms is too little for describing a group, but the flakiness proved that 1ms could be enough sometimes. I thought the group is existing, and we had a couple of other describe group calls inside the same test, that's why it does not throw in some runs. Thus, with a newly created group, with no former describe group call on it, it must always throw an exception if the timeout is 1ms. Although as I explained in the description, I am not sure if it really fixes the issue. If 1ms is enough, it is simply enough. We can either remove this test or reduce it to a lower timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-approved small Small PRs streams tests Test fixes (including flaky tests) tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants