Skip to content

Enforce consistent logging metadata #1339

@BenGalewsky

Description

@BenGalewsky

I used Claude to examine the use of extras in our code to guess at a better schema for logging and identify problems

A review of all log calls in transformer_sidecar/src/transformer_sidecar/ revealed
inconsistent use of extra dicts. This issue proposes a canonical schema and tracks violations.


Proposed Schema

Key Type When to include
request_id str Core context — whenever available
dataset_id int Core context — dataset record ID
file_id int Core context — file record ID
place dict Core context — {"host":…, "site":…, "pod":…}
file_path str File-related — input path being processed
object_name str File-related — output object name in object store
elapsed float Completion events — wall-clock seconds
user float Completion events — CPU user time
sys float Completion events — CPU system time
iowait float Completion events — I/O wait time
files_remaining int Completion events — remaining file counter
files_completed int Completion events — completed file counter
files_failed int Completion events — failed file counter
error str Error/exception logs — message or repr
error_type str Error/exception logs — error-category code
log_body str Error/exception logs — captured science container output
task_id str Queue context — Celery task name
num_files int Queue context — number of files in a batch
metric dict Status-update events — structured transformer-progress payload
deleted list[str] Lifecycle events — deletion summary list

Naming convention: snake_case throughout.


Violations in Sidecar

1. Inconsistent key naming (kebab-case, camelCase, and snake_case all in use)

File Line(s) Violating key Proposed key
transformer.py 144–153 "file-id", "result-destination", "result-format", "service-endpoint" file_id, result_destination, result_format,
service_endpoint
transformer.py 172–178 "file-id", "file-path" file_id, file_path
transformer.py 246 "file-id", "file-size", "total-events" file_id, file_size, total_events
transformer.py 260 "file-id", "file-path" file_id, file_path
transformer.py 369–390 "file-id", "objectName" file_id, object_name
transformer.py 395–403 "file-id", "objectName" file_id, object_name
servicex_adapter.py 115–124 "file-id", "s3-object-name" file_id, object_name
servicex_adapter.py 126–129 "requestId" request_id
object_store_manager.py 102–105 "requestId", "object" request_id, object_name

The log_extra base dict in transformer.py uses "requestId" and "file-id" — fixing
these propagates corrections to all call sites that spread **log_extra.

2. Error/exception logs with no extras

These lose all traceability context:

File Line Message
object_store_manager.py 111 "S3Error"
object_store_manager.py 118 "Minio error"
transformer.py 331 "Expected one tree found {data.keys()}"
transformer.py 347 "Failed to convert ROOT to Parquet: {e}"

At minimum these should include request_id and file_path.

3. place absent from some logs that include other extras

File Line Message
transformer.py 324–327 "Converting ROOT to Parquet" — has request_id and source_path, missing place
transformer.py 395–403 "Error uploading file to object store" — has most fields, missing place
object_store_manager.py 111, 118 Error logs have no extras at all, including no place

4. science_container_command.py — no extras on any log call

All 6 log calls in this file emit no extras. Since this module runs in the per-file
transform loop, request_id and file_id should be threaded in, especially for the
error at line 54 ("problem in getting GeT").

5. object_store_manager.py line 102 — "requestId" key contains a bucket name

{"requestId": bucket, "object": result.object_name}

The value passed is a bucket name, not a request ID. This should be "bucket": bucket
(or request_id if the bucket name encodes the request ID).

Violations in App

ID Bad key Correct key Notes
V1 requestId request_id camelCase instead of snake_case; most common violation
V2 file-id file_id hyphen-separated; comes from raw transformer payload key
V3 s3-object-name object_name hyphen-separated; comes from raw transformer payload key
V4 elapsed-time elapsed hyphen-separated and wrong base name
V5 error-type error_type hyphen-separated
V6 _message error underscore-prefixed; carries error message text
V7 exception error passes an exception object; schema expects str
V8 paths (drop) list of all paths in a batch; no schema equivalent — use num_files for count
V9 report_processed_time elapsed wrong key name for a timing value

Violations by File

transformer_manager.py

Line Bad key(s) Violation(s)
75 requestId V1
561 requestId V1
574 requestId V1
586 requestId V1
599 requestId, exception V1, V7

resources/servicex_resource.py

Line Bad key(s) Violation(s)
96 requestId V1

resources/transformation/cancel.py

Line Bad key(s) Violation(s)
49 requestId V1
53 requestId V1
67 requestId V1

resources/transformation/delete.py

Line Bad key(s) Violation(s)
48 requestId V1
53 requestId V1

resources/transformation/deployment.py

Line Bad key(s) Violation(s)
22 requestId V1

resources/transformation/get_one.py

Line Bad key(s) Violation(s)
45 requestId V1

resources/transformation/status.py

Line Bad key(s) Violation(s)
47 requestId V1
76 requestId V1

resources/transformation/submit.py

Line Bad key(s) Violation(s)
181 requestId V1
195 requestId V1
220 requestId V1
244 requestId V1
293 requestId V1
302 requestId V1
314 requestId V1
320 requestId V1

resources/internal/fileset_complete.py

Line Bad key(s) Violation(s)
56 elapsed-time V4

resources/internal/fileset_error.py

Line Bad key(s) Violation(s)
54–62 elapsed-time, error-type, _message V4, V5, V6
65–73 elapsed-time, error-type, _message V4, V5, V6
90–98 requestId, elapsed-time, error-type, _message V1, V4, V5, V6
108–116 requestId, elapsed-time, error-type, _message V1, V4, V5, V6

resources/internal/transform_status.py

Line Bad key(s) Violation(s)
22 requestId V1
34 requestId V1

resources/internal/transformer_file_complete.py

The log_extra dict built at lines 102–106 propagates bad keys into every downstream call site.

Line Bad key(s) Violation(s)
102–106 requestId, file-id, s3-object-name (in log_extra) V1, V2, V3
108 via log_extra V1, V2, V3
117 via log_extra V1, V2, V3
127 via log_extra V1, V2, V3
143–152 via log_extra, report_processed_time V1, V2, V3, V9
156 via log_extra V1, V2, V3
181 via log_extra V1, V2, V3
228 requestId V1

Fix log_extra at line 102–106 and all six downstream sites (108, 117, 127, 143–152, 156, 181) are fixed automatically.


lookup_result_processor.py

No remaining violationstask_id is now in the schema.


celery/server_tasks.py

Line Bad key(s) Violation(s)
87–90 requestId, paths V1, V8

task_id and num_files on those same lines are now valid schema keys; only requestId (→ request_id) and paths (→ drop) remain.


resources/internal/data_lifecycle_ops.py

No remaining violationsdeleted is now in the schema.


resources/internal/add_file_to_dataset.py

No remaining violationsdataset_id is now in the schema.


resources/internal/fileset_complete.py (second log site)

No remaining violations at line 85dataset_id is now in the schema.


Summary Counts (post-schema update)

Violation Occurrences (log statements affected)
V1 requestIdrequest_id 33
V2 file-idfile_id 6 (all via log_extra in transformer_file_complete.py)
V3 s3-object-nameobject_name 6 (all via log_extra in transformer_file_complete.py)
V4 elapsed-timeelapsed 5
V5 error-typeerror_type 4
V6 _messageerror 4
V7 exceptionerror 1
V8 paths → drop 1
V9 report_processed_timeelapsed 1

V1 dominates. Fixing the requestIdrequest_id rename globally (e.g. with a project-wide search-and-replace scoped to extra= dicts) resolves more than two-thirds of
all remaining violations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions