Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitbook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ redirects:
# Installation
installation/upgrade_notes: ./installation/upgrade-notes.md
installation/supported_platforms: ./installation/downloads.md
installation/docker.md: ./installation/downloads/docker.md
installation/windows.md: ./installation/downloads/windows.md
installation/docker: ./installation/downloads/docker.md
installation/windows: ./installation/downloads/windows.md

# Inputs
input/collectd: ./pipeline/inputs/
Expand Down Expand Up @@ -103,3 +103,4 @@ redirects:
administration/configuring-fluent-bit/yaml/configuration-file: ./administration/configuring-fluent-bit/yaml.md
administration/configuring-fluent-bit/unit-sizes: ./administration/configuring-fluent-bit.md
administration/configuring-fluent-bit/multiline-parsing: ./pipeline/parsers/multiline-parsing.md
administration/buffering-and-storage: ./pipeline/buffering.md
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ description: High Performance Telemetry Agent for Logs, Metrics and Traces
- Metrics support: Prometheus and OpenTelemetry compatible
- Reliability and data integrity
- [Backpressure](administration/backpressure.md) handling
- [Data buffering](administration/buffering-and-storage.md) in memory and file system
- [Data buffering](./pipeline/buffering.md) in memory and file system
- Networking
- Security: Built-in TLS/SSL support
- Asynchronous I/O
Expand Down
2 changes: 1 addition & 1 deletion SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
* [Variables](administration/configuring-fluent-bit/classic-mode/variables.md)
* [AWS credentials](administration/aws-credentials.md)
* [Backpressure](administration/backpressure.md)
* [Buffering and storage](administration/buffering-and-storage.md)
* [Dead letter queue](administration/dead-letter-queue.md)
* [Hot reload](administration/hot-reload.md)
* [HTTP proxy](administration/http-proxy.md)
* [Memory management](administration/memory-management.md)
Expand Down
74 changes: 42 additions & 32 deletions administration/backpressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,47 @@

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=63e37cfe-9ce3-4a18-933a-76b9198958c1" />

It's possible for logs or data to be ingested or created faster than the ability to flush it to some destinations. A common scenario is when reading from big log files, especially with a large backlog, and dispatching the logs to a backend over the network, which takes time to respond. This generates _backpressure_, leading to high memory consumption in the service.
It's possible for Fluent Bit to ingest or create data faster than it can flush that data to the intended destinations. This creates a condition known as _backpressure_.

To avoid backpressure, Fluent Bit implements a mechanism in the engine that restricts the amount of data an input plugin can ingest. Restriction is done through the configuration parameters `Mem_Buf_Limit` and `storage.Max_Chunks_Up`.
Fluent Bit can accommodate a certain amount of backpressure by [buffering](../pipeline/buffering.md) that data until it can be processed and routed. However, if Fluent Bit continues buffering new data to temporary storage faster than it can flush old data, that storage will eventually reach capacity.

As described in [Buffering and storage](../administration/buffering-and-storage.md) , Fluent Bit offers two modes for data handling: in-memory only (default) and in-memory and filesystem (optional).
Strategies for managing backpressure vary depending on the [buffering mode](../pipeline/buffering.md#buffering-modes) for each active input plugin. Because of this, choosing the right buffering mode is also a key part of managing backpressure.

The default `storage.type memory` buffer can be restricted with `Mem_Buf_Limit`. If memory reaches this limit and you reach a backpressure scenario, you won't be able to ingest more data until the data chunks that are in memory can be flushed. The input pauses and Fluent Bit [emits](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1334) a `[warn] [input] {input name or alias} paused (mem buf overlimit)` log message.
## Manage backpressure for memory-only buffering

Depending on the input plugin in use, this might cause incoming data to be discarded (for example, TCP input plugin). The tail plugin can handle pauses without data loss, storing its current file offset and resuming reading later. When buffer memory is available, the input resumes accepting logs. Fluent Bit [emits](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1277) a `[info] [input] {input name or alias} resume (mem buf overlimit)` message.
If one or more active input plugins use [memory-only buffering](../pipeline/buffering.md#memory-only-buffering), use the following settings to manage backpressure.

Mitigate the risk of data loss by configuring secondary storage on the filesystem using the `storage.type` of `filesystem` (as described in [Buffering and storage](../administration/buffering-and-storage.md)). Initially, logs will be buffered to both memory and the filesystem. When the `storage.max_chunks_up` limit is reached, all new data will be stored in the filesystem. Fluent Bit stops queueing new data in memory and buffers only to the filesystem. When `storage.type filesystem` is set, the `Mem_Buf_Limit` setting no longer has any effect. Instead, the `[SERVICE]` level `storage.max_chunks_up` setting controls the size of the memory buffer.
{% hint style="warning" %}
Some input plugins are prone to data loss after `mem_buf_limit` capacity is reached during memory-only buffering. If you need to avoid data loss, consider using [filesystem buffering](../pipeline/buffering.md#filesystem-buffering-hybrid) instead.
{% endhint %}

## `Mem_Buf_Limit`
### Set `mem_buf_limit` for input plugins

`Mem_Buf_Limit` applies only with the default `storage.type memory`. This option is disabled by default and can be applied to all input plugins.
For input plugins that use memory-only buffering, you can configure the `mem_buf_limit` setting to enforce a limit for how much data that plugin can buffer to memory.

As an example situation:
{% hint style="info" %}
This setting doesn't affect how much data can be buffered to memory by plugins that use filesystem buffering.
{% endhint %}

- `Mem_Buf_Limit` is set to `1MB`.
When the specified `mem_buf_limit` capacity is reached, Fluent Bit will stop buffering data from that source plugin until enough buffered chunks are flushed. Most plugins emit a log message that says `[warn] [input] <PLUGIN NAME> paused (mem buf overlimit)` when buffering pauses.

After more memory becomes available, Fluent Bit will resume buffering data from that source plugin. Most plugins emit a log message that says `[info] [input] <PLUGIN NAME> resume (mem buf overlimit)` when buffering resumes.

#### Behavior when capacity is reached

The following example demonstrates what happens when an input plugin with memory-only buffering reaches its `mem_buf_limit` capacity:

- The input plugin's `mem_buf_limit` is set to `1MB`.
- The input plugin tries to append 700&nbsp;KB.
- The engine routes the data to an output plugin.
- The output plugin backend (HTTP Server) is down.
- The output plugin's backend is down, which means it won't accept the data.
- Engine scheduler retries the flush after 10 seconds.
- The input plugin tries to append 500&nbsp;KB.

In this situation, the engine allows appending those 500&nbsp;KB of data into the memory, with a total of 1.2&nbsp;MB of data buffered. The limit is permissive and will allow a single write past the limit. When the limit is exceeded, the following actions are taken:
In this situation, the engine allows appending those 500&nbsp;KB of data into the memory, with a total of 1.2&nbsp;MB of data buffered. The limit is permissive and will allow a single write past the capacity of `mem_buf_limit`. When the limit is exceeded, Fluent Bit takes the following actions:

- Block local buffers for the input plugin (can't append more data).
- Notify the input plugin, invoking a `pause` callback.
- It blocks local buffers for the input plugin (can't append more data).
- It notifies the input plugin, invoking a `pause` callback.

The engine protects itself and won't append more data coming from the input plugin in question. It's the responsibility of the plugin to keep state and decide what to do in a `paused` state.

Expand All @@ -42,32 +54,30 @@ In a few seconds, if the scheduler was able to flush the initial 700&nbsp;KB of
- If the plugin is paused, it invokes a `resume` callback.
- The input plugin can continue appending more data.

## `storage.max_chunks_up`
## Manage backpressure for filesystem buffering

If one or more active input plugins use [filesystem buffering](../pipeline/buffering.md#filesystem-buffering-hybrid), use the following settings to manage backpressure.

The `[SERVICE]` level `storage.max_chunks_up` setting controls the size of the memory buffer. When `storage.type filesystem` is set, the `Mem_Buf_Limit` setting no longer has an effect.
### Set `storage.max_chunks_up` and `storage.backlog.mem_limit` in global settings

The setting behaves similar to the `Mem_Buf_Limit` scenario when the non-default `storage.pause_on_chunks_overlimit` is enabled.
In the [`service` section](../administration/configuring-fluent-bit/yaml/service-section.md) of your Fluent Bit configuration file, you can configure the `storage.max_chunks_up` and `storage.backlog.mem_limit` settings. Both settings dictate how much data can be buffered to memory by input plugins that use filesystem buffering, and are combined limits shared by all applicable input plugins.

When (default) `storage.pause_on_chunks_overlimit` is disabled, the input won't pause when the memory limit is reached. Instead, it switches to buffering logs only in the filesystem. Limit the disk spaced used for filesystem buffering with `storage.total_limit_size`.
{% hint style="info" %}
These settings don't affect how much data can be buffered to memory by plugins that use memory-only buffering.
{% endhint %}

See [Buffering and Storage](buffering-and-storage.md) docs for more information.
When either the specified `storage.max_chunks_up` or `storage.backlog.mem_limit` capacity is reached, all input plugins that use filesystem buffering will stop buffering data to memory until more memory becomes available. Whether these input plugins continue buffering data to the filesystem depends on each plugin's specified `storage.pause_on_chunks_overlimit` value.

## About pause and resume callbacks
### Set `storage.pause_on_chunks_overlimit` for input plugins

Each plugin is independent and not all of them implement `pause` and `resume` callbacks. These callbacks are a notification mechanism for the plugin.
For input plugins that use filesystem buffering, you can configure the `storage.pause_on_chunks_overlimit` setting to specify how each plugin should behave after the global `storage.max_chunks_up` or `storage.backlog.mem_limit` capacity is reached.

One example of a plugin that implements these callbacks and keeps state correctly is the [Tail Input](../pipeline/inputs/tail.md) plugin. When the `pause` callback triggers, it pauses its collectors and stops appending data. Upon `resume`, it resumes the collectors and continues ingesting data. Tail tracks the current file offset when it pauses, and resumes at the same position. If the file hasn't been deleted or moved, it can still be read.
If `storage.pause_on_chunks_overlimit` is set to `off` for an input plugin, the input plugin will stop buffering data to memory but continue buffering data to the filesystem.

With the default `storage.type memory` and `Mem_Buf_Limit`, the following log messages emit for `pause` and `resume`:
If `storage.pause_on_chunks_overlimit` is set to `on` for an input plugin, the input plugin will stop both memory buffering and filesystem buffering until more memory becomes available.

```text
[warn] [input] {input name or alias} paused (mem buf overlimit)
[info] [input] {input name or alias} resume (mem buf overlimit)
```
### Set `storage.total_limit_size` for output plugins

With `storage.type filesystem` and `storage.max_chunks_up`, the following log messages emit for `pause` and `resume`:
Fluent Bit implements the concept of logical queues for buffered chunks. Based on its tag, a chunk can be routed to multiple destinations. Fluent Bit keeps an internal reference from where each chunk was created and where it needs to go. To limit the number of queued chunks, set the `storage.total_limit_size` for any active output plugins that route data ingested by input plugins that use filesystem buffering.

```text
[input] {input name or alias} paused (storage buf overlimit)
[input] {input name or alias} resume (storage buf overlimit)
```
Network failures or latency in third-party services is common for output destinations. In some cases, a chunk is tagged for multiple destinations with varying response times, or one destination is generating more backpressure than others. If an output plugin reaches its configured `storage.total_limit_size` capacity, the oldest chunk from its queue will be discarded to make room for new data.
Loading