in_tail: Data loss on exit/restart due to unhandled buffer (partial lines)

## Bug Report

**Describe the bug**
When using the `in_tail` plugin with a database (`DB`) configured, data loss occurs if Fluent Bit is restarted while there is unprocessed data in the internal buffer. This typically happens when the log file ends with a partial line (no newline character) at the moment of shutdown.

The `in_tail` plugin advances the file offset in the database immediately upon reading data. However, if that data is buffered but not fully processed when Fluent Bit exits, the buffer is discarded.

**To Reproduce**
1. Fixed for logging and build
plugins/in_tail/tail_file.c
```
void flb_tail_file_remove(struct flb_tail_file *file)
...
flb_plg_debug(ctx->ins, "inode=%"PRIu64" removing file name %s",
                  file->inode, file->name);
=>
flb_plg_info(ctx->ins, "inode=%"PRIu64" removing file=%s, buf_len=%lu, offset=%"PRId64,
                  file->inode, file->name, (unsigned long)file->buf_len, file->offset);
```
2. Create input log file
```
#!/usr/bin/env python3

from datetime import datetime

TIMESTAMP = datetime.now().strftime("%d/%b/%Y:%H:%M:%S +0000")
PATH = "/api/v1/" + "a" * 1000


def generate_large_log_line(line_number):
    log_line = (
        f"192.168.1.100 - - [{TIMESTAMP}] "
        f'"GET {PATH} HTTP/1.1" 200 {line_number} '
        f'"-" "Mozilla/5.0"\n'
    )

    return log_line


def main():
    output_file = "/tmp/testing.input"
    target_size = 2 * 1024 * 1024 * 1024
    line_size = 1024
    total_lines = target_size // line_size

    print(f"Starting log generation: {output_file}")
    print(f"Target size: {target_size / (1024**3):.2f} GB")
    print(f"Expected lines: {total_lines:,}")

    written_size = 0
    line_count = 0

    try:
        with open(output_file, "w") as f:
            while written_size < target_size:
                log_line = generate_large_log_line(line_count)
                f.write(log_line)

                written_size += len(log_line.encode("utf-8"))
                line_count += 1

                if line_count % 100000 == 0:
                    progress = (written_size / target_size) * 100
                    size_mb = written_size / (1024 * 1024)
                    print(
                        f"Progress: {progress:.1f}% - {size_mb:.1f} MB - {line_count:,} lines"
                    )

    except KeyboardInterrupt:
        print("\nLog generation interrupted")
    except Exception as e:
        print(f"Error occurred: {e}")

    final_size_gb = written_size / (1024**3)
    print("\nLog generation completed!")
    print(f"Generated size: {final_size_gb:.2f} GB")
    print(f"Generated lines: {line_count:,}")
    print(f"Average line size: {written_size / line_count:.0f} bytes")


if __name__ == "__main__":
    main()

```
4. Run Fluent Bit 
```
fluent-bit -v -c ./fluentbit.conf
```
5. After 3Sec and stop Fluent Bit
6. Check Fluent Bit log
```
[2025/12/08 13:46:41.22042158] [ info] [input:tail:input_log] inode=50630975 removing file=/tmp/testing.input, buf_len=291, offset=1933548765
```

- Rubular link if applicable:
- Example log message if applicable:
```
[2025/12/08 13:46:41.22042158] [ info] [input:tail:input_log] inode=50630975 removing file=/tmp/testing.input, buf_len=291, offset=1933548765
``` 
- Steps to reproduce the problem:

**Expected behavior**
When Fluent Bit shuts down with data in its buffer, it should update (rewind) the offset in the database to point to the start of the unprocessed data. This ensures that on the next startup, the data is re-read and processed correctly.


**Screenshots**


**Your Environment**

Ubuntu 24.04
* Version used:
4.2.0
* Configuration:
```
[SERVICE]
    flush 2
    grace 60
    log_level info
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input
    Exclude_Path *.gz,*.zip
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
```
* Environment name and version (e.g. Kubernetes? What version?):
* Server type and version:
* Operating System and version:
* Filters and plugins:

**Additional context**
Analysis of [plugins/in_tail/tail_file.c](cci:7://file:///data0/devel/git/opensource/fluent-bit/fluent-bit/plugins/in_tail/tail_file.c:0:0-0:0):
- [flb_tail_file_remove](cci:1://file:///data0/devel/git/opensource/fluent-bit/fluent-bit/plugins/in_tail/tail_file.c:1439:0-1540:1) (called on exit) destroys `file->buf_data` without checking for remaining content.
- Since `file->offset` tracks the raw read position and is blindly trusted on restart, the discrepancy between "read" and "processed" leads to data loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

in_tail: Data loss on exit/restart due to unhandled buffer (partial lines) #11265

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

in_tail: Data loss on exit/restart due to unhandled buffer (partial lines) #11265

Description

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions