-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Report
Describe the bug
When using the in_tail plugin with a database (DB) configured, data loss occurs if Fluent Bit is restarted while there is unprocessed data in the internal buffer. This typically happens when the log file ends with a partial line (no newline character) at the moment of shutdown.
The in_tail plugin advances the file offset in the database immediately upon reading data. However, if that data is buffered but not fully processed when Fluent Bit exits, the buffer is discarded.
To Reproduce
- Fixed for logging and build
plugins/in_tail/tail_file.c
void flb_tail_file_remove(struct flb_tail_file *file)
...
flb_plg_debug(ctx->ins, "inode=%"PRIu64" removing file name %s",
file->inode, file->name);
=>
flb_plg_info(ctx->ins, "inode=%"PRIu64" removing file=%s, buf_len=%lu, offset=%"PRId64,
file->inode, file->name, (unsigned long)file->buf_len, file->offset);
- Create input log file
#!/usr/bin/env python3
from datetime import datetime
TIMESTAMP = datetime.now().strftime("%d/%b/%Y:%H:%M:%S +0000")
PATH = "/api/v1/" + "a" * 1000
def generate_large_log_line(line_number):
log_line = (
f"192.168.1.100 - - [{TIMESTAMP}] "
f'"GET {PATH} HTTP/1.1" 200 {line_number} '
f'"-" "Mozilla/5.0"\n'
)
return log_line
def main():
output_file = "/tmp/testing.input"
target_size = 2 * 1024 * 1024 * 1024
line_size = 1024
total_lines = target_size // line_size
print(f"Starting log generation: {output_file}")
print(f"Target size: {target_size / (1024**3):.2f} GB")
print(f"Expected lines: {total_lines:,}")
written_size = 0
line_count = 0
try:
with open(output_file, "w") as f:
while written_size < target_size:
log_line = generate_large_log_line(line_count)
f.write(log_line)
written_size += len(log_line.encode("utf-8"))
line_count += 1
if line_count % 100000 == 0:
progress = (written_size / target_size) * 100
size_mb = written_size / (1024 * 1024)
print(
f"Progress: {progress:.1f}% - {size_mb:.1f} MB - {line_count:,} lines"
)
except KeyboardInterrupt:
print("\nLog generation interrupted")
except Exception as e:
print(f"Error occurred: {e}")
final_size_gb = written_size / (1024**3)
print("\nLog generation completed!")
print(f"Generated size: {final_size_gb:.2f} GB")
print(f"Generated lines: {line_count:,}")
print(f"Average line size: {written_size / line_count:.0f} bytes")
if __name__ == "__main__":
main()
- Run Fluent Bit
fluent-bit -v -c ./fluentbit.conf
- After 3Sec and stop Fluent Bit
- Check Fluent Bit log
[2025/12/08 13:46:41.22042158] [ info] [input:tail:input_log] inode=50630975 removing file=/tmp/testing.input, buf_len=291, offset=1933548765
- Rubular link if applicable:
- Example log message if applicable:
[2025/12/08 13:46:41.22042158] [ info] [input:tail:input_log] inode=50630975 removing file=/tmp/testing.input, buf_len=291, offset=1933548765
- Steps to reproduce the problem:
Expected behavior
When Fluent Bit shuts down with data in its buffer, it should update (rewind) the offset in the database to point to the start of the unprocessed data. This ensures that on the next startup, the data is re-read and processed correctly.
Screenshots
Your Environment
Ubuntu 24.04
- Version used:
4.2.0 - Configuration:
[SERVICE]
flush 2
grace 60
log_level info
log_file /tmp/testing/logs/testing.log
parsers_file /tmp/testing/parsers.conf
plugins_file /tmp/testing/plugins.conf
http_server on
http_listen 0.0.0.0
http_port 22002
storage.path /tmp/testing/storage
storage.metrics on
storage.max_chunks_up 512
storage.sync full
storage.checksum off
storage.backlog.mem_limit 100M
[INPUT]
Name tail
Path /tmp/testing.input
Exclude_Path *.gz,*.zip
Tag testing
Key message
Offset_Key log_offset
Read_from_Head true
Refresh_Interval 3
Rotate_Wait 31557600
Buffer_Chunk_Size 1MB
Buffer_Max_Size 16MB
Inotify_Watcher false
storage.type filesystem
storage.pause_on_chunks_overlimit true
DB /tmp/testing/storage/testing.db
DB.sync normal
DB.locking false
Alias input_log
[OUTPUT]
Name file
Match *
File /tmp/testing.out
- Environment name and version (e.g. Kubernetes? What version?):
- Server type and version:
- Operating System and version:
- Filters and plugins:
Additional context
Analysis of plugins/in_tail/tail_file.c:
- flb_tail_file_remove (called on exit) destroys
file->buf_datawithout checking for remaining content. - Since
file->offsettracks the raw read position and is blindly trusted on restart, the discrepancy between "read" and "processed" leads to data loss.