Skip to content

Conversation

@tomekwilk
Copy link

@tomekwilk tomekwilk commented Jan 2, 2025

This PR is based on PR #3668 but addresses Azure blob storage. The azure_blob plugin was modify to accept 'log_key' option. By default the entire log record is sent to storage. When 'log_key' option is specified in the output plugin configuration, then only the value of the key is sent to the storage blob.

Addresses #9721

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Doc PR fluent/fluent-bit-docs#1540


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

By default the entire record is sent to azure blob storage. Here is an example of a sample configuration and default output

Configuration

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on

Record without log_key
{"@timestamp":"2025-01-02T16:56:02.906357Z","name":"Fluent Bit","year":2020}

if the 'log_key' is specified then only the specific key value is sent to azure blob storage

Sample configuration with log_key

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on
    log_key               name

Record with log_key set to name
Fluent Bit

Example Valgrind output

root@fluent-bit:/tmp# valgrind ./fluent-bit -c azure.conf
==3022== Memcheck, a memory error detector
==3022== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3022== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3022== Command: ./fluent-bit -c azure.conf
==3022==
Fluent Bit v3.2.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2025/01/02 19:56:50] [ info] [fluent bit] version=3.2.3, commit=addf261e8c, pid=3022
[2025/01/02 19:56:50] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/01/02 19:56:50] [ info] [simd    ] disabled
[2025/01/02 19:56:50] [ info] [cmetrics] version=0.9.9
[2025/01/02 19:56:50] [ info] [ctraces ] version=0.5.7
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] initializing
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] account_name=twilk123, container_name=test-container, blob_type=appendblob, emulator_mode=no, endpoint=twilk123.blob.core.windows.net, auth_type=key
[2025/01/02 19:56:51] [ info] [sp] stream processor started
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] container 'test-container' already exists
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/01/02 19:57:03] [engine] caught signal (SIGINT)
[2025/01/02 19:57:03] [ warn] [engine] service will shutdown in max 5 seconds
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [engine] service has stopped (0 pending tasks)
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopped
==3022==
==3022== HEAP SUMMARY:
==3022==     in use at exit: 0 bytes in 0 blocks
==3022==   total heap usage: 17,894 allocs, 17,894 frees, 2,471,158 bytes allocated
==3022==
==3022== All heap blocks were freed -- no leaks are possible
==3022==
==3022== For lists of detected and suppressed errors, rerun with: -s
==3022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Addresses #9721

Summary by CodeRabbit

  • New Features

    • Added a "log_key" configuration to send only a specific field from each log record.
    • When set, records emit the key's value (string, integer, or float); if missing or unsupported, the record is safely skipped.
    • If "log_key" is unset, behavior remains unchanged — logs continue as JSON lines.
  • Chores

    • Configuration exposure and cleanup updated to manage the new option safely.

@adrinaula
Copy link

@edsiper Can you please give us an update?

@tomekwilk
Copy link
Author

memory leak test after rewrite:

$ valgrind build/bin/fluent-bit -c fluentbit.cfg
==225827== Memcheck, a memory error detector
==225827== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==225827== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==225827== Command: build/bin/fluent-bit -c fluentbit.cfg
==225827==
Fluent Bit v4.0.3
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/06/11 14:22:02] [ info] [fluent bit] version=4.0.3, commit=97285bdd2a, pid=225827
[2025/06/11 14:22:03] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/06/11 14:22:03] [ info] [simd    ] disabled
[2025/06/11 14:22:03] [ info] [cmetrics] version=1.0.2
[2025/06/11 14:22:03] [ info] [ctraces ] version=0.6.6
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] initializing
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] account_name=devstoreaccount1, container_name=logs, blob_type=appendblob, emulator_mode=yes, endpoint=http://127.0.0.1
:10000, auth_type=key
[2025/06/11 14:22:03] [ info] [sp] stream processor started
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] container 'logs' already exists
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/06/11 14:22:18] [engine] caught signal (SIGINT)
[2025/06/11 14:22:18] [ warn] [engine] service will shutdown in max 5 seconds
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [engine] service has stopped (0 pending tasks)
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] initializing worker
==225827==
==225827== HEAP SUMMARY:
==225827==     in use at exit: 0 bytes in 0 blocks
==225827==   total heap usage: 7,292 allocs, 7,292 frees, 1,413,601 bytes allocated
==225827==
==225827== All heap blocks were freed -- no leaks are possible
==225827==
==225827== For lists of detected and suppressed errors, rerun with: -s
==225827== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@khalillilahk
Copy link

Hi @tomekwilk @lockewritesdocs,
Just checking in, are there any blockers preventing this PR from being merged?
Let me know if there's anything I can do to help move it forward.

@coderabbitai
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Adds optional msgpack log_key extraction to send only a single field value, updates formatter/control-flow and function signature to support this, exposes a log_key config option and struct field, includes record accessor headers, and frees log_key on context destroy.

Changes

Cohort / File(s) Summary
Formatter, extraction and API
plugins/out_azure_blob/azure_blob.c
Added cb_azb_msgpack_extract_log_key(...) to extract a single field via record accessor; updated azure_blob_format(...) signature to accept flush/context/event metadata and return out_data/out_size; conditional path: use log_key extraction when set, otherwise JSON lines formatting; added includes <fluent-bit/flb_record_accessor.h> and <fluent-bit/flb_ra_key.h>; added log_key config_map entries; minor whitespace adjustments.
Public struct change
plugins/out_azure_blob/azure_blob.h
Added flb_sds_t log_key to struct flb_azure_blob.
Config cleanup
plugins/out_azure_blob/azure_blob_conf.c
Free ctx->log_key in flb_azure_blob_conf_destroy (calls flb_sds_destroy and NULLs the pointer).

Sequence Diagram(s)

sequenceDiagram
    participant In as Input
    participant AZB as AzureBlob Plugin
    participant Fmt as Formatter
    participant AZ as Azure Blob Service

    In->>AZB: Flush event (msgpack, tag, bytes)
    AZB->>Fmt: azure_blob_format(config, ins, ctx, flush_ctx, event_type, tag, tag_len, data, bytes, out_data, out_size)
    alt log_key configured
        Fmt->>Fmt: cb_azb_msgpack_extract_log_key(ctx, data, bytes)
        Note right of Fmt: locate field via record accessor\nconvert to string/number\nreturn allocated out_data/out_size
        Fmt-->>AZB: out_data, out_size (single-field payload)
    else
        Fmt->>Fmt: format records as JSON lines -> out_data/out_size
        Fmt-->>AZB: out_data, out_size (JSON lines payload)
    end
    AZB->>AZ: Upload formatted payload
    AZ-->>AZB: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to: cb_azb_msgpack_extract_log_key memory allocation/cleanup, error paths and return semantics; updated azure_blob_format signature and all call sites; config_map entries exposing log_key.

Suggested reviewers

  • leonardo-albertovich
  • koleini
  • fujimotos

Poem

I hop through bytes with whiskers keen,
A single key now trims the scene;
If log_key calls, I fetch that prize —
One tidy line beneath the skies.
Otherwise I hum JSON tunes, and send to Azure by the moon. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'out_azure_blob: add log_key option' directly and clearly describes the main change in the changeset. It specifies the component (out_azure_blob plugin), the action (add), and the feature being added (log_key option). The title accurately reflects the primary purpose of the PR, which is to introduce a new configuration option to the Azure Blob output plugin.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1116ebc and 1200368.

📒 Files selected for processing (3)
  • plugins/out_azure_blob/azure_blob.c (6 hunks)
  • plugins/out_azure_blob/azure_blob.h (1 hunks)
  • plugins/out_azure_blob/azure_blob_conf.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_blob/azure_blob.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_azure_blob/azure_blob.c
🧬 Code graph analysis (2)
plugins/out_azure_blob/azure_blob.c (4)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_get_value_object (803-814)
  • flb_ra_destroy (232-248)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_copy (260-281)
  • flb_sds_cat (120-141)
  • flb_sds_printf (336-387)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)
plugins/out_azure_blob/azure_blob_conf.c (1)
src/flb_sds.c (1)
  • flb_sds_destroy (389-399)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomekwilk
Copy link
Author

I rebased the PR to resolve the merge conflicts after recent master changes. This PR is waiting to be re-reviewed and merged. Not sure if there is anything else for me to do.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
plugins/out_azure_blob/azure_blob.c (2)

34-35: Good: using Record Accessor APIs.

Including flb_record_accessor.h and flb_ra_key.h aligns with prior guidance to avoid manual map walking.


70-75: Call flb_errno() before flb_plg_error() on RA creation failure.

Swap the calls so errno is captured before logging.

Apply:

-    if (!ra) {
-        flb_plg_error(ctx->ins, "invalid record accessor pattern '%s'", ctx->log_key);
-        flb_errno();
-        return NULL;
-    }
+    if (!ra) {
+        flb_errno();
+        flb_plg_error(ctx->ins, "invalid record accessor pattern '%s'", ctx->log_key);
+        return NULL;
+    }
🧹 Nitpick comments (2)
plugins/out_azure_blob/azure_blob.c (2)

177-186: Safer behavior: fallback to JSON when extraction yields no output.

Avoid dropping data if log_key is missing/unsupported; gracefully fall back.

Apply:

-    if (ctx->log_key) {
-        out_buf = cb_azb_msgpack_extract_log_key(ctx, data, bytes);
-    }
-    else {
+    if (ctx->log_key) {
+        out_buf = cb_azb_msgpack_extract_log_key(ctx, data, bytes);
+        if (!out_buf) {
+            flb_plg_warn(ctx->ins, "log_key='%s' yielded no data; falling back to JSON lines", ctx->log_key);
+        }
+    }
+    if (!out_buf) {
         out_buf = flb_pack_msgpack_to_json_format(data, bytes,
                                                   FLB_PACK_JSON_FORMAT_LINES,
                                                   FLB_PACK_JSON_DATE_ISO8601,
                                                   ctx->date_key,
                                                   config->json_escape_unicode);
     }

1897-1904: Clarify that log_key uses Record Accessor syntax.

Config text says “key name,” but code uses record accessor. Recommend noting RA path examples (e.g., log, kubernetes['labels']['app']) to set user expectations. Also document newline-delimited output when multiple records are present.

I can update the docs snippet accordingly if desired.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d985e8e and e3339c5.

📒 Files selected for processing (2)
  • plugins/out_azure_blob/azure_blob.c (6 hunks)
  • plugins/out_azure_blob/azure_blob.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_azure_blob/azure_blob.c (4)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_get_value_object (803-814)
  • flb_ra_destroy (232-248)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_copy (260-281)
  • flb_sds_cat (120-141)
  • flb_sds_printf (336-387)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)

@SamerJ
Copy link

SamerJ commented Oct 15, 2025

Hello @edsiper , @adrinaula ,

This PR tackles an issue that we've also recently faced.
Any idea if there are anything preventing/blocking the merger?

Would be interested to contribute if need be :) .

Thanks in Advance,

@eschabell
Copy link

@tomekwilk Eduardo requested a change, can you take a look at fixing?

@overmeulen
Copy link

@tomekwilk Eduardo requested a change, can you take a look at fixing?

which change are we talking about ? this one ?
flb_errno() needs to be called before flb_plg_error()

If we can help in any way don't hesitate, we have the exact same requirement but we don't want to create a new PR that does exactly what @tomekwilk did...

@tomekwilk
Copy link
Author

I fixed one place where flb_errno() was after flb_plg_error() and rebased the PR. Not sure what else can be blocking this PR. I requested re-review after addressing the initial comments but heard nothing back.

If anyone would like to help push this PR forward or verify the change feel free, it would be appreciated. I am currently traveling and have limited access. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants