diff --git a/pipeline/processors/sampling.md b/pipeline/processors/sampling.md index 82b575b27..698e3d377 100644 --- a/pipeline/processors/sampling.md +++ b/pipeline/processors/sampling.md @@ -1,53 +1,36 @@ # Sampling -The _Sampling_ processor is designed with a pluggable architecture, allowing easy extension to support multiple trace sampling strategies and backends. It provides you with the ability to apply head or tail sampling to incoming trace telemetry data. +The _sampling_ processor uses an extendable architecture that supports multiple trace sampling strategies and backends. It provides you with the ability to apply head or tail sampling to incoming trace telemetry data. {% hint style="info" %} -**Note:** Both processors and this specific component can be enabled only by using -the YAML configuration format. Classic mode configuration format doesn't support processors. +Only [YAML configuration files](../administration/configuring-fluent-bit/yaml/README.md) support processors. {% endhint %} -Available samplers: +## Configuration parameters -- `probabilistic` (head sampling) -- `tail` (tail sampling) +This processor uses the following configuration parameters: -Conditions: - -- `latency` -- `span_count` -- `status_code` -- `string_attribute` -- `numeric_attribute` -- `boolean_attribute` -- `trace_state` - -## Configuration Parameters - -The processor does not provide any extra configuration parameter, it can be used directly in your _processors_ Yaml directive. - -## Sampling types - -Sampling has both a name and a type with the following possible settings: - -| Key | Possible values | -| :----- | :---------------------: | -| `name` | `sampling` | -| `type` | `probabilistic`, `tail` | +| Key | Description | +| --- | ----------- | +| `type` | The type of sampling to perform. Possible values: `probabilistic` ([head sampling](#head-sampling)) or `tail` ([tail sampling](#tail-sampling)). | +| `sampling_settings` | Contains key/value pairs for different sampling settings. These settings vary by `type`. | +| `conditions` | An array of objects where each object specifies a different [condition](#conditions) for `tail` sampling. The possible items in each object vary by `conditions.type`. | ## Head sampling -In this example, head sampling will be used to process a smaller percentage of the overall ingested traces and spans. This is done by setting up the pipeline to ingest on the OpenTelemetry defined port as shown below using the OpenTelemetry Protocol (OTLP). The processor section defines traces for head sampling and the sampling percentage defining the total ingested traces and spans to be forwarded to the defined output plugins. +Head sampling makes the decision whether or not to keep a trace at the very beginning of its ingestion. This is when a root span is created but before the request is actually fulfilled. -![](../../.gitbook/assets/traces_head_sampling.png) +![Head sampling diagram](../../.gitbook/assets/traces_head_sampling.png) -| Sampling settings | Description | -| :-------------------- | :------------------------------------------------------------------------------------------------------------------ | -| `sampling_percentage` | This sets the probability of sampling trace, can be between 0-100%. For example, 40 samples 40% of traces randomly. | +Head sampling uses the following `sampling_settings` configuration parameters: -Example configuration: +| Key | Description | +| --- | :---------- | +| `sampling_percentage` | Sets the probability of sampling trace. Must be a value between `0` and `100`. For example, `40` samples 40% of traces randomly. | + +This example uses head sampling to process a smaller percentage of the overall ingested traces and spans. It accomplishes this by setting up the pipeline to ingest on the OpenTelemetry defined port using the OpenTelemetry Protocol (OTLP). The `processor` section defines traces for head sampling and the percentage of traces and spans to forward to the specified output plugins. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -69,7 +52,7 @@ pipeline: - name: sampling type: probabilistic sampling_settings: - sampling_percentage: 40 + sampling_percentage: 40 outputs: - name: stdout @@ -79,33 +62,48 @@ pipeline: {% endtab %} {% endtabs %} -With this head sampling configuration, a sample set of ingested traces will randomly send 40% of the total traces to the standard output. +With this head sampling configuration, a sample set of ingested traces will randomly send 40% of the total traces to standard output. ## Tail sampling -Tail sampling is used to obtain a more selective and fine grained control over the collection of traces and spans without collecting everything. Below is an example showing the process is a combination of waiting on making a sampling decision together followed by configuration defined conditions to determine the spans to be sampled. +Tail sampling offers a more selective and fine-grained control over the collection of traces and spans. It evaluates the entire trace when making a sampling decision and can inspect the metadata and status of traces to inform the decision. + +![Tail sampling diagram](../../.gitbook/assets/traces_tail_sampling.png) + +Tail sampling uses the following `sampling_settings` configuration parameters: -![](../../.gitbook/assets/traces_tail_sampling.png) +| Key | Description | Default | +| --- | :---------- | ------- | +| `decision_wait` | Specifies how long to buffer spans before making a sampling decision, allowing full trace evaluation. | `30s` | +| `max_traces` | Specifies the maximum number of traces that can be held in memory. When the limit is reached, the oldest trace is deleted. | _none_ | -The following samplings settings are available with their default values: +### Conditions -| Sampling settings | Description | Default value | -| :---------------- | :------------------------------------------------------------------------------------------------------------------------- | :-----------: | -| `decision_wait` | Specifies how long to buffer spans before making a sampling decision, allowing full trace evaluation. | 30s | -| `max_traces` | Specifies the maximum number of traces that can be held in memory. When the limit is reached, the oldest trace is deleted. | | +Tail sampling supports different conditions. These conditions determine whether a trace meets specified criteria. -The tail-based sampler supports various conditionals to sample traces if their spans meet a specific condition. +The following condition are available: -### Condition: latency +- [Latency](#latency) +- [Span count](#span-count) +- [Status code](#status-code) +- [String attribute](#string-attribute) +- [Numeric attribute](#numeric-attribute) +- [Boolean attribute](#boolean-attribute) +- [Trace state](#trace-state) + +#### Latency This condition samples traces based on span duration. It uses `threshold_ms_low` to capture short traces and `threshold_ms_high` for long traces. -| Condition settings | Description | Default value | -| :------------------ | :------------------------------------------------------------------------------------------- | :-----------: | -| `threshold_ms_low` | Specifies the lower latency threshold. Traces with a duration <= this value will be sampled. | 0 | -| `threshold_ms_high` | Specifies the upper latency threshold. Traces with a duration >= this value will be sampled. | 0 | +The latency condition uses the following `conditions` configuration parameters: -Example configuration: +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the latency condition, this value must be `latency`. | _none_ | +| `threshold_ms_low` | Specifies the lower latency threshold. Traces with a duration less than or equal to this value will be sampled. | `0` | +| `threshold_ms_high` | Specifies the upper latency threshold. Traces with a duration greater than or equal to this value will be sampled. | `0` | + +The following example waits five seconds before making a decision. It then samples traces based on latency, capturing short traces of 200 ms or less and long traces of 3000 ms or more. Traces between 200 ms and 3000 ms are not sampled unless another condition applies. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -141,18 +139,19 @@ pipeline: {% endtab %} {% endtabs %} -This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on latency, capturing short traces of 200ms or less and long traces of 3000ms or more. Traces between 200ms and 3000ms are not sampled unless another condition applies. - -### Condition: span_count +#### Span count This condition samples traces that have specific span counts defined in a configurable range. It uses `min_spans` and `max_spans` to specify the number of spans a trace can have to be sampled. -| Condition settings | Description | Default value | -| :----------------- | :--------------------------------------------------------------------- | :-----------: | -| `max_spans` | Specifies the minimum number of spans a trace must have to be sampled. | | -| `min_spans` | Specifies the maximum number of spans a trace can have to be sampled. | | +The span count condition uses the following `conditions` configuration parameters: -Example configuration: +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the span count condition, this value must be `span_count`. | _none_ | +| `max_spans` | Specifies the minimum number of spans a trace must have to be sampled. | _none_ | +| `min_spans` | Specifies the maximum number of spans a trace can have to be sampled. | _none_ | + +The following example configuration waits five seconds before making a decision. It then samples traces with at least three spans but no more than five spans. Traces with less than three spans or greater than five spans are not sampled unless another condition applies. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -188,17 +187,18 @@ pipeline: {% endtab %} {% endtabs %} -This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on having a minimum of 3 spans and a maximum of 5 spans. Traces with less than 3 and more than 5 spans are not sampled unless another condition applies. - -### Condition: status_code +#### Status code This condition samples traces based on span status codes (`OK`, `ERROR`, `UNSET`). -| Condition settings | Description | Default value | -| :----------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | -| `status_codes` | Defines an array of span status codes (`OK`, `ERROR`, `UNSET`) to filter traces. Traces are sampled if any span matches a listed status code. For example, `status_codes: [ERROR, UNSET]` captures traces with errors or unset statuses. | | +The status code condition uses the following `conditions` configuration parameters: -Example configuration: +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the status code condition, this value must be `status_code`. | _none_ | +| `status_codes` | Defines an array of span status codes (`OK`, `ERROR`, `UNSET`) to filter traces. Traces are sampled if any span matches a listed status code. For example, `status_codes: [ERROR, UNSET]` captures traces with errors or unset statuses. | _none_ | + +The following example configuration samples only spans with the `ERROR` status code. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -233,25 +233,25 @@ pipeline: {% endtab %} {% endtabs %} -With this tail-based sampling configuration, a sample set of ingested traces will select only the spans with status codes marked as `ERROR` to the standard output. - -### Condition: string_attribute +#### String attribute -This conditional allows traces to be sampled based on specific span or resource attributes. Users can define key-value filters (e.g., http.method=POST) to selectively capture relevant traces. +This conditional lets you sample traces based on specific span or resource attributes. You can define key-value filters (for example, `http.method=POST`) to selectively capture relevant traces. -| Condition settings | Description | Default value | -| :----------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | -| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | | -| `values` | Defines an array of accepted values for the attribute. A trace is sampled if any span contains a matching key-value pair: `["payment-processing"]` | | -| `match_type` | Defines how attributes are compared: `strict` ensures exact value matching, `exists` checks if the attribute is present regardless of its value, and `regex` enables regular expression pattern matching | `strict` | +The string attribute condition uses the following `conditions` configuration parameters: -#### Match Types +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the string attribute condition, this value must be `string_attribute`. | _none_ | +| `key` | Specifies the span or resource attribute to match (for example, `"service.name"`). | _none_ | +| `values` | Defines an array of accepted values for the attribute. A trace is sampled if any span contains a matching key-value pair: `["payment-processing"]` | _none_ | +| `match_type` | Defines how attributes are compared: `strict` ensures exact value matching (and is case-sensitive), `exists` checks if the attribute is present regardless of its value, and `regex` uses regular expression pattern matching. | `strict` | -- **`strict`**: Exact value matching (case-sensitive) -- **`exists`**: Checks if the attribute key is present, regardless of its value -- **`regex`**: Matches values using regular expression patterns +The following example configuration waits two seconds before making a decision. It then samples traces based on string matching key-value pairs: -Example configuration: +- Traces with `http.method` exactly equal to `GET` +- Traces that have a `service.name` attribute (any value) +- Traces with `http.url` starting with `https://api.` or ending with `/health` +- Traces with `error.message` containing timeout, connection failed, or rate limit patterns {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -285,13 +285,13 @@ pipeline: - type: string_attribute match_type: exists key: "service.name" - + # Regex pattern matching - type: string_attribute match_type: regex key: "http.url" values: ["^https://api\\..*", ".*\\/health$"] - + # Multiple regex patterns for error conditions - type: string_attribute match_type: regex @@ -306,25 +306,21 @@ pipeline: {% endtab %} {% endtabs %} -This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces based on string matching key value pairs: - -- Traces with `http.method` exactly equal to `GET` -- Traces that have a `service.name` attribute (any value) -- Traces with `http.url` starting with `https://api.` or ending with `/health` -- Traces with `error.message` containing timeout, connection failed, or rate limit patterns - -### Condition: numeric_attribute +#### Numeric attribute This condition samples traces based on numeric attribute values of a defined key where users can configure minimum and maximum thresholds. -| Condition settings | Description | Default value | -| :----------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | -| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | | -| `min_value` | The minimum inclusive value for the numeric attribute. Traces with values >= the `min_value` are sampled. | | -| `max_value` | The maximum inclusive value for the numeric attribute. Traces with values <= the `max_value` are sampled. | | -| `match_type` | This defines how attribute values are evaluated: `strict` matches exact values, `exists` checks if the attribute is present, regardless of its value. | `strict` | +The numeric attribute condition uses the following `conditions` configuration parameters: -Example configuration: +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the numeric attribute condition, this value must be `numeric_attribute`. | _none_ | +| `key` | Specifies the span or resource attribute to match (for example, `"service.name"`). | _none_ | +| `min_value`| The minimum inclusive value for the numeric attribute. Traces with values greater than or equal to the `min_value` are sampled. | _none_ | +| `max_value` | The maximum inclusive value for the numeric attribute. Traces with values less than or equal to the `max_value` are sampled. | _none_ | +| `match_type` | This defines how attribute values are evaluated: `strict` matches exact values, `exists` checks if the attribute is present, regardless of its value. | `strict` | + +The following example configuration samples only spans with the key `http.status code` with numeric values between `400` and `504` inclusive. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -361,18 +357,19 @@ pipeline: {% endtab %} {% endtabs %} -With this tail-based sampling configuration, a sample set of ingested traces will select only the spans with a key `http.status code` with numeric values between 400 and 504 inclusive. - -### Condition: boolean_attribute +#### Boolean attribute This condition samples traces based on a boolean attribute value of a defined key. This allows for selection of traces based on flags such as error indicators or debug modes. -| Condition settings | Description | Default value | -| :----------------- | :------------------------------------------------------------------------ | :-----------: | -| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | | -| `value` | Expected boolean value: `true` or `false` | | +The Boolean attribute condition uses the following `conditions` configuration parameters: -Example configuration: +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the Boolean attribute condition, this value must be `boolean_attribute`. | _none_ | +| `key` | Specifies the span or resource attribute to match (for example, `"service.name"`). | _none_ | +| `value` | Expected boolean value: `true` or `false` | _none_ | + +The following example configuration waits two seconds before making a decision. It then samples traces that have the key `user.logged` set to `false`. {% tabs %} {% tab title="fluent-bit.yaml" %} @@ -408,15 +405,16 @@ pipeline: {% endtab %} {% endtabs %} -This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces that do not have the key `user.logged` set to true. Traces are sampled if the key `user.logged` is set to `true`. +#### Trace state -### Condition: trace_state +This condition samples traces based on metadata stored in the W3C `trace_state` field. -This condition samples traces based on metadata stored int he W3C `trace_state` field. +The trace state condition uses the following `conditions` configuration parameters: -| Condition settings | Description | Default value | -| :----------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------: | -| `values` | Defines a list of key, value pairs to match against the `trace_state`. A trace is sampled if any of the specified values exist in the `trace_state` field. Matching follows OR logic, meaning at least one value must be present for sampling to occur. | | +| Key | Description | Default | +| --- | ----------- | --------| +| `type` | Sets the condition type. For the trace state condition, this value must be `trace_state`. | _none_ | +| `values` | Defines a list of key, value pairs to match against the `trace_state`. A trace is sampled if any of the specified values exist in the `trace_state` field. Matching follows OR logic, meaning at least one value must be present for sampling to occur. | _none_ | Example configuration: @@ -453,6 +451,4 @@ pipeline: {% endtab %} {% endtabs %} -This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces that do not have the key `user.logged` set to true. Traces are sampled if the key `user.logged` is set to `true`. - -For more details about further processing, read the [Content Modifier](../processors/content-modifier.md) processor documentation. \ No newline at end of file +For more details about further processing, read the [content modifier](../processors/content-modifier.md) processor documentation.