Merge branch 'master' into out_azure_kusto_workload_identity

tanmaya-panda1 · web-flow · commit 3e533aa4f309 · 2025-05-06T17:41:30.000+05:30
Signed-off-by: Tanmaya Panda &lt;108695755+tanmaya-panda1@users.noreply.github.com&gt;
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -123,4 +123,18 @@ possible. For example:
 ### Vale
 
 The Fluent Bit maintainers use the [Vale](https://vale.sh/docs/) plugin, which lints
-pull requests and adds suggestions to improve style and clarity.
+pull requests and adds suggestions to improve style and clarity. Most Vale tests are
+at the `suggestion` level and won't block merging.
+
+The following tests are at a `error` level and will prevent merging:
+
+- [NonStandardQuotes](https://github.com/fluent/fluent-bit-docs/blob/master/vale-styles/FluentBit/NonStandardQuotes.yml):
+  [Use standard quotes](https://developers.google.com/style/quotation-marks#straight-and-curly-quotation-marks).
+  By default, Google Docs and Microsoft Word turn standard straight quotes into "smart"
+  curly quotes. If you copy-paste from one of these tools, you must correct the quotes
+  back to straight quotes. You can also turn off smart quotes
+  in [Google Docs](https://support.google.com/docs/thread/217182974/can-i-turn-smart-quotes-off-in-a-google-doc?hl=en)
+  or [Microsoft Word](https://support.microsoft.com/en-us/office/smart-quotes-in-word-and-powerpoint-702fc92e-b723-4e3d-b2cc-71dedaf2f343)
+  to prevent this problem.
+- [Repetition](https://github.com/errata-ai/vale/blob/v3/testdata/styles/Markup/Repetition.yml):
+  Checks for the same word used twice in succession.
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -146,8 +146,8 @@
   * [Labels](pipeline/processors/labels.md)
   * [Metrics Selector](pipeline/processors/metrics-selector.md)
   * [OpenTelemetry Envelope](pipeline/processors/opentelemetry-envelope.md)
+  * [Sampling](pipeline/processors/sampling.md)
   * [SQL](pipeline/processors/sql.md)
-  * [Traces](pipeline/processors/traces.md)
   * [Filters as processors](pipeline/processors/filters.md)
   * [Conditional processing](pipeline/processors/conditional-processing.md)
 * [Filters](pipeline/filters/README.md)
diff --git a/pipeline/inputs/http.md b/pipeline/inputs/http.md
@@ -12,7 +12,7 @@ description: The HTTP input plugin allows you to send custom records to an HTTP
 | port                     | The port for Fluent Bit to listen on                                                                                                          | 9880    |
 | tag_key                  | Specify the key name to overwrite a tag. If set, the tag will be overwritten by a value of the key.                                           |         |
 | buffer_max_size          | Specify the maximum buffer size in KB to receive a JSON message.                                                                              | 4M      |
-| buffer_chunk_size        | This sets the chunk size for incoming incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size. | 512K    |
+| buffer_chunk_size        | This sets the chunk size for incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size. | 512K    |
 | successful_response_code | It allows to set successful response code. `200`, `201` and `204` are supported.                                                              | 201     |
 | success_header           | Add an HTTP header key/value pair on success. Multiple headers can be set. Example: `X-Custom custom-answer`                                  |         |
 | threaded | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` |
@@ -34,7 +34,7 @@ The http input plugin allows Fluent Bit to open up an HTTP port that you can the
 #### How to set tag
 
 The tag for the HTTP input plugin is set by adding the tag to the end of the request URL. This tag is then used to route the event through the system.
-For example, in the following curl message below the tag set is `app.log**. **` because the end end path is `/app_log`:
+For example, in the following curl message below the tag set is `app.log**. **` because the end path is `/app_log`:
 
 ### Curl request
 
diff --git a/pipeline/inputs/prometheus-remote-write.md b/pipeline/inputs/prometheus-remote-write.md
@@ -13,7 +13,7 @@ This input plugin allows you to ingest a payload in the Prometheus remote-write
 | listen            | The address to listen on                                                                                                                       | 0.0.0.0 |
 | port              | The port for Fluent Bit to listen on                                                                                                           | 8080    |
 | buffer\_max\_size   | Specify the maximum buffer size in KB to receive a JSON message.                                                                               | 4M      |
-| buffer\_chunk\_size | This sets the chunk size for incoming incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size.  | 512K    |
+| buffer\_chunk\_size | This sets the chunk size for incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size.  | 512K    |
 |successful\_response\_code | It allows to set successful response code. `200`, `201` and `204` are supported.| 201 |
 | tag\_from\_uri      | If true, tag will be created from uri, e.g. api\_prom\_push from /api/prom/push, and any tag specified in the config will be ignored. If false then a tag must be provided in the config for this input. | true    |
 | uri               | Specify an optional HTTP URI for the target web server listening for prometheus remote write payloads, e.g: /api/prom/push                       | |
diff --git a/pipeline/inputs/splunk.md b/pipeline/inputs/splunk.md
@@ -10,7 +10,7 @@ The **splunk** input plugin handles [Splunk HTTP HEC](https://docs.splunk.com/Do
 | port                     | The port for Fluent Bit to listen on                                                                                                          | 9880    |
 | tag_key                  | Specify the key name to overwrite a tag. If set, the tag will be overwritten by a value of the key.                                           |         |
 | buffer_max_size          | Specify the maximum buffer size in KB to receive a JSON message.                                                                              | 4M      |
-| buffer_chunk_size        | This sets the chunk size for incoming incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size. | 512K    |
+| buffer_chunk_size        | This sets the chunk size for incoming JSON messages. These chunks are then stored/managed in the space available by buffer_max_size. | 512K    |
 | successful_response_code | It allows to set successful response code. `200`, `201` and `204` are supported.                                                              | 201     |
 | splunk\_token            | Specify a Splunk token for HTTP HEC authentication. If multiple tokens are specified (with commas and no spaces), usage will be divided across each of the tokens. |         |
 | store\_token\_in\_metadata | Store Splunk HEC tokens in the Fluent Bit metadata. If set false, they will be stored as normal key-value pairs in the record data.                              | true    |
diff --git a/pipeline/inputs/standard-input.md b/pipeline/inputs/standard-input.md
@@ -17,7 +17,7 @@ If no parser is configured for the stdin plugin, it expects *valid JSON* input d
 1. A JSON object with one or more key-value pairs: `{ "key": "value", "key2": "value2" }`
 3. A 2-element JSON array in [Fluent Bit Event](../../concepts/key-concepts.md#event-or-record) format, which may be:
   * `[TIMESTAMP, { "key": "value" }]` where TIMESTAMP is a floating point value representing a timestamp in seconds; or
-  * from Fluent Bit v2.1.0, `[[TIMESTAMP, METADATA], { "key": "value" }]` where TIMESTAMP has the same meaning as above and and METADATA is a JSON object.
+  * from Fluent Bit v2.1.0, `[[TIMESTAMP, METADATA], { "key": "value" }]` where TIMESTAMP has the same meaning as above and METADATA is a JSON object.
 
 Multi-line input JSON is supported.
 
diff --git a/pipeline/outputs/postgresql.md b/pipeline/outputs/postgresql.md
@@ -12,7 +12,7 @@ According to the parameters you have set in the configuration file, the plugin w
 
 > **NOTE:** If you are not familiar with how PostgreSQL's users and grants system works, you might find useful reading the recommended links in the "References" section at the bottom.
 
-A typical installation normally consists of a self-contained database for Fluent Bit in which you can store the output of one or more pipelines. Ultimately, it is your choice to to store them in the same table, or in separate tables, or even in separate databases based on several factors, including workload, scalability, data protection and security.
+A typical installation normally consists of a self-contained database for Fluent Bit in which you can store the output of one or more pipelines. Ultimately, it is your choice to store them in the same table, or in separate tables, or even in separate databases based on several factors, including workload, scalability, data protection and security.
 
 In this example, for the sake of simplicity, we use a single table called `fluentbit` in a database called `fluentbit` that is owned by the user `fluentbit`. Feel free to use different names. Preferably, for security reasons, do not use the `postgres` user \(which has `SUPERUSER` privileges\).
 
diff --git a/pipeline/processors/README.md b/pipeline/processors/README.md
@@ -18,9 +18,9 @@ Fluent Bit offers the following processors:
 - [Metrics Selector](metrics-selector.md): Choose which metrics to keep or discard.
 - [OpenTelemetry Envelope](opentelemetry-envelope.md): Transform logs into an
   OpenTelemetry-compatible format.
-- [SQL](sql.md): Use SQL queries to extract log content.
-- [Traces](traces.md): Trace sampling designed with a pluggable architecture,
+- [Sampling](sampling.md): Trace sampling designed with a pluggable architecture,
   allowing easy extension to support multiple sampling strategies and backends.
+- [SQL](sql.md): Use SQL queries to extract log content.
 - [Filters](filters.md): Any filter can be used as a processor.
 
 ## Features
diff --git a/pipeline/processors/metrics-selector.md b/pipeline/processors/metrics-selector.md
@@ -1,27 +1,45 @@
 # Metrics Selector
 
-The **metric_selector** processor allows you to select metrics to include or exclude (similar to the `grep` filter for logs).
+The _Metrics Selector_ processor lets you choose which metrics to include or exclude, similar to the [Grep](../pipeline/filters/grep) filter for logs.
 
 <img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=326269f3-cfea-472d-9169-1de32c142b90" />
 
-## Configuration Parameters <a id="config"></a>
+## Configuration parameters
 
-The native processor plugin supports the following configuration parameters:
+The Metrics Selector processor supports the following configuration parameters:
 
 | Key         | Description | Default |
 | :---------- | :--- | :--- |
-| Metric\_Name | Keep metrics in which the metric of name matches with the actual name or the regular expression. | |
-| Context | Specify matching context. Currently, metric\_name and delete\_label\_value are only supported. | `Metrics_Name` |
-| Action | Specify the action for specified metrics. INCLUDE and EXCLUDE are allowed. | |
-| Operation\_Type | Specify the operation type of action for metrics payloads. PREFIX and SUBSTRING are allowed. | |
-| Label | Specify a label key and value pair. | |
+| `metric_name` | The string that determines which metrics are affected by this processor, depending on the active [matching operation](#matching-operations). | |
+| `context` | Specifies matching context. Possible values: `metric_name` or `delete_label`. | `metrics_name` |
+| `action` | Specifies whether to include or exclude matching metrics. Possible values: `INCLUDE` or `EXCLUDE`. | |
+| `operation_type` | Specifies the [matching operation](#matching-operations) to apply to the value of `metric_name`. Possible values: `PREFIX` or `SUBSTRING`. | |
+| `label` | Specifies a label key and value pair. | |
 
-## Configuration Examples <a id="config_example"></a>
+## Matching operations
 
-Here is a basic configuration example.
+The Metrics Selector processor has two matching operations: prefix matching and substring matching.
+
+### Prefix matching
+
+Prefix matching compares the value of `metric_name` to the beginning of each incoming metric name. For example, `metric_name: fluentbit_input` results in a match for metrics named `fluentbit_input_records`, but not for metrics named `total_fluentbit_input`.
+
+If no `operation_type` value is specified and the value of `metric_name` is a standard string, the Metrics Selector processor defaults to prefix matching.
+
+### Substring matching
+
+Substring matching treats the value of `metric_name` as a regex pattern, and compares this pattern against each incoming metric name accordingly. This pattern can appear anywhere within the name of the incoming metric. For example, `metric_name: bytes` results in a match for metrics named `bytes_total` and metrics named `input_bytes_count`.
+
+If the value of `metric_name` is a string wrapped in forward slashes (for example, `metric_name: /storage..*/`), the Metrics Selector processor defaults to substring matching, regardless of whether an `operation_type` value is specified. This means that a `metric_name` value wrapped in forward slashes will always use substring matching, even if `operation_type` is set to `PREFIX`.
+
+However, if `operation_type` is explicitly set to `SUBSTRING`, you don't need to wrap the value of `metric_name` in forward slashes.
+
+## Configuration examples
+
+The following examples show possible configurations of the Metrics Selector processor.
+
+### Without `context`
 
-{% tabs %}
-{% tab title="fluent-bit.yaml" %}
 ```yaml
 service:
   flush: 5
@@ -51,9 +69,9 @@ pipeline:
     - name: stdout
       match: '*'
 ```
-{% endtab %}
 
-{% tab title="context-delete\_label\_value.yaml" %}
+### With `context`
+
 ```yaml
 service:
   flush: 5
@@ -80,14 +98,3 @@ pipeline:
     - name: stdout
       match: '*'
 ```
-{% endtab %}
-{% endtabs %}
-
-
-All processors are only valid with the YAML configuration format. 
-Processor configuration should be located under the relevant input or output plugin configuration.
-
-Metric\_Name parameter will translate the strings which is quoted with backslashes `/.../` as Regular expressions.
-Without them, users need to specify Operation\_Type whether prefix matching or substring matching.
-The default operation is prefix matching.
-For example, `/chunks/` will be translated as a regular expression.
diff --git a/pipeline/processors/sampling.md b/pipeline/processors/sampling.md
@@ -1,6 +1,6 @@
-# Traces
+# Sampling
 
-The _Traces_ sampling processor is designed with a pluggable architecture, allowing easy extension to support multiple sampling strategies and backends. It provides you with the ability to apply head or tail sampling to incoming trace telemetry data.
+The _Sampling_ processor is designed with a pluggable architecture, allowing easy extension to support multiple trace sampling strategies and backends. It provides you with the ability to apply head or tail sampling to incoming trace telemetry data.
 
 Available samplers:
 
@@ -21,9 +21,9 @@ Conditions:
 
 The processor does not provide any extra configuration parameter, it can be used directly in your _processors_ Yaml directive.
 
-## Traces types
+## Sampling types
 
-Traces have both a name and a type with the following possible settings:
+Sampling has both a name and a type with the following possible settings:
 
 | Key    |     Possible values     |
 | :----- | :---------------------: |
diff --git a/stream-processing/README.md b/stream-processing/README.md
@@ -2,8 +2,6 @@
 
 ![](https://github.com/fluent/fluent-bit-docs/tree/6bc4af039821d9e8bc1636797a25ad23b52a511f/stream-processing/imgs/stream_processor.png)
 
-[Fluent Bit](https://fluentbit.io) is a fast and flexible Log processor that aims to collect, parse, filter and deliver logs to remote databases, so Data Analysis can be performed.
+Fluent Bit is a fast and flexible log processor that collects, parsers, filters, and delivers logs to remote databases, where data analysis can then be performed.
 
-Data Analysis usually happens after the data is stored and indexed in a database, but for real-time and complex analysis needs, process the data while it's still in motion in the Log processor brings a lot of advantages and this approach is called **Stream Processing on the Edge**.
-
-[Fluent Bit](http://fluentbit.io) is part of the [Fluentd](http://fluentd.org) project ecosystem, it's licensed under the terms of the [Apache License v2.0](http://www.apache.org/licenses/LICENSE-2.0). This project is made and sponsored by [Arm](https://www.arm.com) & [Treasure Data](https://www.treasuredata.com).
+For real-time and complex analysis needs, you can also process the data while it's still in motion through _stream processing on the edge_.
diff --git a/stream-processing/getting-started/README.md b/stream-processing/getting-started/README.md
@@ -1,21 +1,9 @@
-# Getting Started
-
-The following guide assumes that you are familiar with [Fluent Bit](https://fluentbit.io), if that is not the case we suggest you review the official manual first:
-
-* [Fluent Bit Manual](https://docs.fluentbit.io/manual/)
-
-## Requirements
-
-* [Fluent Bit](https://fluentbit.io) &gt;= v1.1.0 or Fluent Bit from [GIT Master](https://github.com/fluent/fluent-bit)
-* Basic understanding of Structured Query Language \(SQL\)
-
-## Technical Concepts
+# Get started
 
 | Concept | Description |
 | :--- | :--- |
-| Stream | A Stream represents an unique flow of data being ingested by an Input plugin. By default Streams get a name using the plugin name plus an internal numerical identification, e.g: tail.0 . Stream name can be changed setting the _alias_ property. |
-| Task | Stream Processor configuration have the notion of Tasks that represents an execution unit, for short: SQL queries are configured in a Task. |
-| Results | When Stream Processor runs a SQL query, results are generated. These results can be re-ingested back into the main Fluent Bit pipeline or simply redirected to the standard output interfaces for debugging purposes. |
-| Tag | Fluent Bit group records and associate a Tag to them. Tags are used to define routing rules or in the case of the stream processor to attach to specific Tag that matches a pattern. |
-| Match | Matching rule that can use a wildcard to match specific records associated to a Tag. |
-
+| Stream | A stream is a single flow of data being ingested by an input plugin. By default, each stream name is the name of its input plugin plus a number (for example, `tail.0`). You can use the `alias` property to change this name. |
+| Task | A single execution unit. For example, a SQL query. |
+| Results | After a stream processor runs a SQL query, results are generated. You can re-ingest these results back into the main Fluent Bit pipeline or redirect them to the standard output interface for debugging purposes. |
+| Tag | Fluent Bit groups records and assigns tags to them. These tags define routing rules and can be used to apply stream processors to specific tags that match a pattern. |
+| Match | Matching rules can use a wildcard to match specific records associated with a tag. |
diff --git a/stream-processing/getting-started/check-keys-null-values.md b/stream-processing/getting-started/check-keys-null-values.md
diff --git a/stream-processing/getting-started/hands-on.md b/stream-processing/getting-started/hands-on.md
diff --git a/vale-styles/FluentBit/NonStandardQuotes.yml b/vale-styles/FluentBit/NonStandardQuotes.yml
diff --git a/vale-styles/FluentBit/Repetition.yml b/vale-styles/FluentBit/Repetition.yml