Skip to content

Conversation

@2ZZ
Copy link
Contributor

@2ZZ 2ZZ commented Sep 22, 2025

Which issue(s) this PR fixes:
Fixes #4396

What this PR does / why we need it:
Adds timeout mechanism to establish_connection method to prevent infinite loop when handshake protocol gets stuck. In unstable network environments with proxy components, if connection drops during handshake after TLS establishment, Fluentd gets stuck in infinite loop causing logs to stop being flushed. This fix uses existing hard_timeout configuration to break the loop, disable problematic nodes, and maintain log flow through healthy nodes.

Docs Changes:
None required - uses existing hard_timeout configuration parameter.

Release Note:
out_forward: fix issue where could cause output to stop when using <security> and TLS setting together under unstable network environments

@2ZZ 2ZZ force-pushed the fix-issue-2969-handshake-timeout branch from 0989d87 to ec945bf Compare September 22, 2025 09:48
@2ZZ 2ZZ marked this pull request as ready for review September 22, 2025 09:49
@2ZZ 2ZZ changed the title Fix #2969: Add timeout to establish_connection to prevent infinite loop Fix #4396: Add timeout to establish_connection to prevent infinite loop Sep 22, 2025
@2ZZ 2ZZ force-pushed the fix-issue-2969-handshake-timeout branch from ec945bf to cbc2ea8 Compare September 22, 2025 10:08
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late response.
Thanks so much for considering a fix for this critical issue!

I don’t fully understand the cause and haven’t been able to reproduce it, but at least, it would be good to address the risk of entering an infinite loop.
So, this fix looks good to me.

I left one comment. Could you please check it?

@daipom daipom added this to the v1.20.0 milestone Sep 29, 2025
@daipom daipom added backport to v1.19 We will backport this fix to the LTS branch backport to v1.16 We will backport this fix to the LTS branch labels Sep 29, 2025
@daipom
Copy link
Contributor

daipom commented Sep 30, 2025

I want to clarify the conditions for #4396, if possible.
I think it happens only when both of the following are true. Is that right?

  • Use <security> setting
  • Use TLS

It appears that the establish_connection method is only called when <security> setting is used.
In addition, a TLS-related error (OpenSSL::SSL::SSLErrorWaitReadable) causes the infinite loop.

I’m concerned that <security> doesn’t seem to be used in the configurations in #4396.

…ite loop

- Add timeout check in establish_connection method using send_timeout
- Prevents infinite loop when connection drops during handshake protocol
- Disables problematic nodes and logs timeout warnings
- Add test case to verify timeout functionality works correctly

Fixes issue where logs stop being flushed when handshake gets stuck
in unstable network environments with proxy components.

Signed-off-by: Ian Driver <[email protected]>
@2ZZ 2ZZ force-pushed the fix-issue-2969-handshake-timeout branch from cbc2ea8 to 179bf0f Compare September 30, 2025 14:08
@2ZZ
Copy link
Contributor Author

2ZZ commented Sep 30, 2025

I want to clarify the conditions for #4396, if possible. I think it happens only when both of the following are true. Is that right?

  • Use <security> setting
  • Use TLS

It appears that the establish_connection method is only called when <security> setting is used. In addition, a TLS-related error (OpenSSL::SSL::SSLErrorWaitReadable) causes the infinite loop.

I’m concerned that <security> doesn’t seem to be used in the configurations in #4396.

I agree, I wonder if the original reporter redacted their security config. I do use the security setting in the environment where I'm seeing this behaviour.

@daipom
Copy link
Contributor

daipom commented Oct 1, 2025

I want to clarify the conditions for #4396, if possible. I think it happens only when both of the following are true. Is that right?

  • Use <security> setting
  • Use TLS

It appears that the establish_connection method is only called when <security> setting is used. In addition, a TLS-related error (OpenSSL::SSL::SSLErrorWaitReadable) causes the infinite loop.
I’m concerned that <security> doesn’t seem to be used in the configurations in #4396.

I agree, I wonder if the original reporter redacted their security config. I do use the security setting in the environment where I'm seeing this behaviour.

Thanks!
I’m relieved that our assumptions match.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@2ZZ
Copy link
Contributor Author

2ZZ commented Oct 1, 2025

LGTM! Thanks!

Is there anything I need to do from here to get this merged?

@daipom
Copy link
Contributor

daipom commented Oct 2, 2025

Is there anything I need to do from here to get this merged?

No. I’m only waiting for the CI to pass. 😄

@daipom daipom merged commit 90f5ec9 into fluent:master Oct 2, 2025
28 of 32 checks passed
@daipom
Copy link
Contributor

daipom commented Oct 2, 2025

Thanks!

Watson1978 pushed a commit that referenced this pull request Nov 4, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Nov 4, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
@Watson1978 Watson1978 added the backported "backport to LTS" is done label Nov 4, 2025
Watson1978 pushed a commit that referenced this pull request Nov 4, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Nov 4, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Nov 6, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Dec 5, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Dec 5, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Watson1978 pushed a commit that referenced this pull request Dec 5, 2025
…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
kenhys pushed a commit that referenced this pull request Dec 5, 2025
…event infinite loop (#5104) (#5137)

Backport #5104

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Watson1978 added a commit to Watson1978/fluentd that referenced this pull request Dec 8, 2025
…nnection to prevent infinite loop (fluent#5104) (fluent#5137)"

This reverts commit 18fe509.
Watson1978 added a commit to Watson1978/fluentd that referenced this pull request Dec 8, 2025
…nnection to prevent infinite loop (fluent#5104) (fluent#5137)"

This reverts commit 18fe509.
daipom pushed a commit that referenced this pull request Dec 9, 2025
…event infinite loop (#5104) (#5138)

Backport #5104

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

<!--
Thank you for contributing to Fluentd!
Your commits need to follow DCO: https://probot.github.io/apps/dco/
And please provide the following information to help us make the most of
your pull request:
-->

Signed-off-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport to v1.16 We will backport this fix to the LTS branch backport to v1.19 We will backport this fix to the LTS branch backported "backport to LTS" is done

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Out forward stuck establishing connection

3 participants