feat: Apply upload criteria to memory buffering; Upload the memory buffer to S3 on exit.#17
Conversation
Co-authored-by: davidlion <davidlion2@protonmail.com>
… into memoryBuffering
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
davidlion
left a comment
There was a problem hiding this comment.
The title seems to not really match the changes or the description. We probably need to either change the title or add more context to the description.
internal/irzstd/memory.go
Outdated
| // Checks if writer is empty. True if no events are buffered. Try to avoid calling this as will | ||
| // flush Zstd Writer potentially creating unnecessary frames. | ||
| // | ||
| // Returns: | ||
| // - empty: Boolean value that is true if buffer is empty | ||
| // - err: nil error to comply with interface | ||
| func (w *memoryWriter) CheckEmpty() (bool, error) { | ||
| w.zstdWriter.Flush() |
There was a problem hiding this comment.
The name of this method doesn't match what it does and to some degree doesn't really match its use case either.
the plugin now attempts to flush buffered data to S3 on shutdown. This is not needed for disk buffer mode since those files are recovered on restart (however it may sense to make this a configurable option in the future).
I feel it is confusing that during a graceful exit we don't upload to S3 when using disk buffering. Instead we wait until restart for recovery to handle this. If I stop fluentbit, I'd expect all the logs available to be uploaded.
I feel adding a flush method would fit the use case more.
What do you think?
There was a problem hiding this comment.
Just to clarify — are you suggesting introducing a separate Flush() method on the writer, in addition to CheckEmpty()?
There was a problem hiding this comment.
I understand the expectation, but I was hesitant to upload on graceful shutdown in disk buffer mode because it can introduce duplication or corruption issues.
If we upload but the process is forcefully exited before the disk files are properly truncated, recovery could upload them again and create duplicates. Forcing additional flushes from the IR file to the Zstd file during shutdown could also be slow and, if interrupted mid-write, leave files in an inconsistent state.
One of the reasons to select disk buffering mode is fault tolerance; exiting quickly and relying on upload on restart is the safe path.
If we want “upload everything on stop” behavior for disk buffering, we could consider making it a configurable option if that’s something users would expect.
There was a problem hiding this comment.
Just to clarify — are you suggesting introducing a separate Flush() method on the writer, in addition to CheckEmpty()?
Yup
For the behaviour, I think it is fair as long as we document everything.
I feel the current title reflects the high-level change from a user perspective. Previously, memory mode sent each event directly to S3 as it was received. With this change, users can configure a specific upload size before data is flushed. Because the old mode did not buffer data, graceful shutdown handling wasn’t necessary. Introducing buffering made that logic required. That said, I agree the description could be updated to better align with the title. We could also consider referencing S3 explicitly in the title if that would make the scope clearer. Let me know if you still feel the title doesn’t capture the change. |
| // - readyToUpload: Boolean if upload criteria met or not | ||
| // - err: Error getting Zstd buffer size | ||
| func checkUploadCriteriaMet(eventManager *outctx.EventManager, uploadSizeMb int) (bool, error) { | ||
| if !eventManager.Writer.GetUseDiskBuffer() { |
There was a problem hiding this comment.
this one change changes buffer behaviour to not immediately upload
davidlion
left a comment
There was a problem hiding this comment.
Sorry for the slow review.
I think re-naming CheckEmpty to Empty and adding a Flush method to the interface (and calling both in ToS3) should be fine.
I still feel it is confusing that in one mode it will upload on exit and in another it won't, but as long as it is documented I don't think it is a blocking problem (and I was too caught up on it).
You can update the readme in this PR or another, up to you.
Description
PR adds support to buffer logs in memory rather than immediately flushing them when not using disk buffer. Note the functionality gain here is not really the point of this PR. In reality, this is a refactoring PR to address interface issues in a different PR - see comment. Once this change is complete, the behaviour of disk and memory modes will be more similar, and we can remove separate diskUploadListener and memoryUploadListener (unify into a single upload listener) in #8.
Memory buffering
Changed memory buffer behavior to buffer logs before uploading to S3. Previously, memory buffer mode would immediately send logs to S3 on each flush. Now it accumulates logs until the configured upload size is reached, making it consistent with disk buffer behavior.
Gracefull exit to s3
Added graceful exit handling for memory buffer mode. Since memory buffer logs are not persisted to disk, the plugin now attempts to flush buffered data to S3 on shutdown. This is not needed for disk buffer mode since those files are recovered on restart (however it may sense to make this a configurable option in the future).
Checklist
breaking change.
Validation performed
Tested that logs are buffered, and that the buffered file sent to s3 is readable by log viewer. Tested that the logs are sent to s3 when sent kill signal, and that file is readable by log viewer.