-
Notifications
You must be signed in to change notification settings - Fork 138
Fix worker logs not getting flushed in Core log level set to ERROR #1777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is an improvement over an unbounded buffer and has more forceful flushing mechanisms with the threshold flush and the unconditional flush.
I'm a bit concerned with hardcoded values that configure the buffer size and flush timers (minBufferTimeMs, flushPassIntervalMs, and maxBufferSize). The only potential issue I can see (aside from incorrectly ordered logs) is if the log emission rate is greater than the flushing rate. Not sure if that's a realistic concern.
In any case, it would be nice to benchmark the flushing functions. It would give a clear indicator at what log emission threshold these buffer/flush config values no longer work.
I generally think this is an improvement, depending on how strenuous the general logging case is.
| } | ||
| } | ||
|
|
||
| private appendOne(entry: LogEntry): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit of a nit, but would prefer a better name here (i.e. appendOneAndFlush)
it's not clear unless you read the method that you are also flushing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, given that we suspect the unconditional flush to handle when the timer flush is overwhelmed.
some nits above
|
Split code, renamed methods, and rephrased comments to clarify the points raised by @THardy98. |
Not a concern. That's what |
What was changed
Considerably improved the
Runtime'sNativeLogCollectorlogic.NativeLogCollectornow starts its own interval timer to trigger log flushing.NativeLogCollectornow ensures that the number of buffered log entries can't exceed 2000.NativeLogCollectornow ensure that each message gets throttled by at least 100 ms. This is required to ensure correct global ordering of messages.NativeLogCollectoris indeed flushing as expected.I designed this thing so that we can make it user-tunable in the future, but I don’t want to commit to that exact algorithm (and therefore its parameters) just yet. Arguably, it should even be possible for users to completely opt out of log buffering/reordering, e.g. if that's already done anyway by their downstream log collector. These will be considered in future work.
Why?
Runtimeis configured to forward logs from Core to the lang side, we use a lang-side buffer (NativeLogCollector) to reorder log entries coming from different sources (notably Workflow log APIs through Sinks, and Core) based on their absolute timestamp. This helps reduce incoherencies in log order.NativeLogCollectorwas relying on Core periodic flushes to perform its own flushing to the downstream logger. However, it turns out that Core won't flush unless there has been at least one log message in the mean time, which can be a rare occurrence if Core's log level is set to WARN or ERROR.