-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Background
plugins/out_stackdriver/stackdriver.c:stackdriver_format() still contains a mix of:
- recoverable per-record validation failures,
- request-scoped/resource-derivation failures, and
- batch-fatal decoder/serialization failures.
PR #11539 is the first step in cleaning this up.
Task Breakdown
-
Task 1: convert invalid
logging.googleapis.com/labelstype from batch-fatal to per-record drop- Implemented in PR out_stackdriver: fix batch drop on invalid labels #11539: out_stackdriver: fix batch drop on invalid labels #11539
- This also centralizes current per-record skip logic in
should_skip_record()so prescan and packing stay aligned.
-
Task 2: classify every
stackdriver_format()failure path by recovery model- For each failure site in or directly used by
stackdriver_format(), classify it as:record-fatal: drop only the current recordrequest-fatal: cannot safely build the Cloud Logging request for the surviving recordsbatch-fatal: decoder/serialization/internal failure where recovery is not possible
- This should cover at least:
flb_log_event_decoder_init()failureflb_log_event_decoder_next()failure- invalid
insertId - invalid
labels - k8s
local_resource_idextraction / processing failures - final msgpack-to-JSON serialization failure
- For each failure site in or directly used by
-
Task 3: define the desired behavior for k8s
local_resource_idfailures- The main unresolved gap is request-level monitored resource derivation for:
k8s_containerk8s_nodek8s_pod
- We need an explicit decision for mixed-validity batches:
- continue using the first raw record,
- use the first surviving valid record after prescan,
- fall back to another monitored resource such as
global, or - keep these cases request-fatal / batch-fatal.
- This needs to be decided before expanding per-record recovery further, because monitored resource selection is request-scoped.
- The main unresolved gap is request-level monitored resource derivation for:
-
Task 4: refactor request-scoped resource derivation to match the chosen design
- If Task 3 chooses a recoverable model, refactor formatter flow so request metadata is derived consistently with record dropping.
- Likely requirements:
- prescan surviving records first,
- identify the request-defining record if needed,
- derive monitored resource and request-scoped labels from that source,
- pack only surviving records.
-
Task 5: add TDD coverage for request-scoped k8s/resource failures
- Add tests before implementation for at least:
k8s_container: first record invalid/missinglocal_resource_id, later record validk8s_node: first record invalid/missinglocal_resource_id, later record validk8s_pod: first record invalid/missinglocal_resource_id, later record valid- all-invalid k8s batches
- mixed valid/invalid batches where request-level resource fields come from record content
- Each test should explicitly assert whether the expected result is:
- no output,
- one surviving entry,
- multiple surviving entries, or
- request-fatal behavior.
- Add tests before implementation for at least:
-
Task 6: add lower-level tests for unrecoverable decoder/serialization paths
- Some failures are hard to reproduce through the runtime harness because valid msgpack is usually fed into the formatter.
- Add lower-level/unit coverage for:
- decoder init failure
- decoder next/read failure mid-batch
- serialization failure handling where feasible
- These tests should confirm which paths intentionally remain batch-fatal.
-
Task 7: document intentional non-recoverable behavior
- If some formatter failures remain batch-fatal after review, document that explicitly rather than leaving them ambiguous.
- In particular, these likely remain intentionally unrecoverable:
- decoder initialization failure
- decoder iteration failure
- final serialization failure
Current Technical Status
- PR out_stackdriver: fix batch drop on invalid labels #11539 addresses one recoverable formatter failure: invalid
logging.googleapis.com/labelstype. - Invalid
logging.googleapis.com/insertIdwas already per-record. - The main remaining technical gap is k8s
local_resource_idand other request-scoped resource-derivation behavior. - Decoder init/read failures and final serialization failure are likely intentionally batch-fatal, but that should be explicitly codified.
Done Definition
This issue is complete when:
- every known
stackdriver_format()failure path is classified asrecord-fatal,request-fatal, orbatch-fatal, - the expected behavior for k8s
local_resource_idfailures is explicitly defined, - any newly recoverable cases are implemented without prescan/main-loop divergence,
- tests cover every documented scenario,
- intentionally batch-fatal paths are documented as such.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels