Skip to content

I/O Timeouts can cause error and audit ingestion to silently stop#4780

Merged
ramonsmits merged 1 commit intomasterfrom
hanging-ingestion
Feb 11, 2025
Merged

I/O Timeouts can cause error and audit ingestion to silently stop#4780
ramonsmits merged 1 commit intomasterfrom
hanging-ingestion

Conversation

@ramonsmits
Copy link
Copy Markdown
Member

@ramonsmits ramonsmits commented Feb 6, 2025

Description

I/O Timeouts can cause error and audit ingestion to silently stop.

Symptoms

No error or audit messages are ingested.

Who's affected

All customers running version 5 and 6 are affected by this issue.

NOTE: Versions before 4.32.0 are not affected.

Root cause

OperationCancelledException can cause hanging ingestion, unnoticed failing tasks, and no logging of timeouts

Backported to

@ramonsmits ramonsmits self-assigned this Feb 6, 2025
@ramonsmits ramonsmits added the Bug label Feb 6, 2025
@ramonsmits ramonsmits marked this pull request as ready for review February 7, 2025 14:48
@ramonsmits
Copy link
Copy Markdown
Member Author

@andreasohlund Load tested this build for 12+ hours with multiple instances on this version and 6.3.0. All 6.3.0 instances eventually halted processing.

@ramonsmits ramonsmits marked this pull request as draft February 8, 2025 17:32
@andreasohlund andreasohlund added this to the 6.1.3 milestone Feb 10, 2025
@andreasohlund andreasohlund changed the title Incorrect handling of OperationCancelledException can causes hanging ingestion, unnoticed failing tasks, and no logging of timeouts OperationCancelledException can causes hanging ingestion, unnoticed failing tasks, and no logging of timeouts Feb 10, 2025
@andreasohlund andreasohlund modified the milestones: 6.1.3, 6.3.1 Feb 10, 2025
@ramonsmits ramonsmits marked this pull request as ready for review February 10, 2025 12:38
@andreasohlund andreasohlund removed this from the 6.3.1 milestone Feb 10, 2025
…ich can cause the ingestion to never used incoming context tasks and hang.

- Rectored to use BackgroundService
- Fix `catch (OperationCancelledException)` blocks to add `when` guards - only ignore cancellations set by caller
- Overriding Start and Stop to support graceful shutdown and improve intent and readability
- Ensure TrySetException is always set on exception
- Logging improvements around cancellation to inform that tasks got cancelled. Important when shutting down to diagnose where teardown "hangs".
ramonsmits added a commit that referenced this pull request Feb 10, 2025
…ich can cause the ingestion to never used incoming context tasks and hang. (#4791)

Backport of #4780
ramonsmits added a commit that referenced this pull request Feb 10, 2025
…ich can cause the ingestion to never used incoming context tasks and hang. (#4792)

Backport of #4780
@andreasohlund andreasohlund changed the title OperationCancelledException can causes hanging ingestion, unnoticed failing tasks, and no logging of timeouts I/O Timeouts can cause error and audit ingestion to silently stop Feb 11, 2025
@ramonsmits ramonsmits merged commit 92c6864 into master Feb 11, 2025
30 checks passed
@ramonsmits ramonsmits deleted the hanging-ingestion branch February 11, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants