Enforce consistent logging metadata

I used Claude to examine the use of `extras` in our code to guess at a better schema for logging and identify problems
                                                                                                                                                                               
  A review of all log calls in `transformer_sidecar/src/transformer_sidecar/` revealed                                                                                          
  inconsistent use of `extra` dicts. This issue proposes a canonical schema and tracks violations.
                                                                                                                                                                                
  ---                                              
                                                                                                                                                                                
  ## Proposed Schema                               
          
  | Key | Type | When to include |
  |---|---|---|
  | `request_id` | str | Core context — whenever available |
  | `dataset_id` | int | Core context — dataset record ID |
  | `file_id` | int | Core context — file record ID |
  | `place` | dict | Core context — `{"host":…, "site":…, "pod":…}` |
  | `file_path` | str | File-related — input path being processed |
  | `object_name` | str | File-related — output object name in object store |
  | `elapsed` | float | Completion events — wall-clock seconds |
  | `user` | float | Completion events — CPU user time |
  | `sys` | float | Completion events — CPU system time |
  | `iowait` | float | Completion events — I/O wait time |
  | `files_remaining` | int | Completion events — remaining file counter |
  | `files_completed` | int | Completion events — completed file counter |
  | `files_failed` | int | Completion events — failed file counter |
  | `error` | str | Error/exception logs — message or repr |
  | `error_type` | str | Error/exception logs — error-category code |
  | `log_body` | str | Error/exception logs — captured science container output |
  | `task_id` | str | Queue context — Celery task name |
  | `num_files` | int | Queue context — number of files in a batch |
  | `metric` | dict | Status-update events — structured transformer-progress payload |
  | `deleted` | list[str] | Lifecycle events — deletion summary list |
                                                                                                                                                                                
  **Naming convention:** `snake_case` throughout.  
          
  ---

  # Violations in Sidecar

  ### 1. Inconsistent key naming (kebab-case, camelCase, and snake_case all in use)                                                                                             
   
  | File | Line(s) | Violating key | Proposed key |                                                                                                                             
  |---|---|---|---|                                
  | `transformer.py` | 144–153 | `"file-id"`, `"result-destination"`, `"result-format"`, `"service-endpoint"` | `file_id`, `result_destination`, `result_format`,
  `service_endpoint` |                                                                                                                                                          
  | `transformer.py` | 172–178 | `"file-id"`, `"file-path"` | `file_id`, `file_path` |
  | `transformer.py` | 246 | `"file-id"`, `"file-size"`, `"total-events"` | `file_id`, `file_size`, `total_events` |                                                            
  | `transformer.py` | 260 | `"file-id"`, `"file-path"` | `file_id`, `file_path` |                                                                                              
  | `transformer.py` | 369–390 | `"file-id"`, `"objectName"` | `file_id`, `object_name` |                                                                                       
  | `transformer.py` | 395–403 | `"file-id"`, `"objectName"` | `file_id`, `object_name` |                                                                                       
  | `servicex_adapter.py` | 115–124 | `"file-id"`, `"s3-object-name"` | `file_id`, `object_name` |                                                                              
  | `servicex_adapter.py` | 126–129 | `"requestId"` | `request_id` |                                                                                                            
  | `object_store_manager.py` | 102–105 | `"requestId"`, `"object"` | `request_id`, `object_name` |                                                                             
                                                                                                                                                                                
  The `log_extra` base dict in `transformer.py` uses `"requestId"` and `"file-id"` — fixing                                                                                     
  these propagates corrections to all call sites that spread `**log_extra`.                                                                                                     
                                                                                                                                                                                
  ### 2. Error/exception logs with no `extras`     
          
  These lose all traceability context:                                                                                                                                          
   
  | File | Line | Message |                                                                                                                                                     
  |---|---|---|                                    
  | `object_store_manager.py` | 111 | `"S3Error"` |                                                                                                                             
  | `object_store_manager.py` | 118 | `"Minio error"` |                                                                                                                         
  | `transformer.py` | 331 | `"Expected one tree found {data.keys()}"` |
  | `transformer.py` | 347 | `"Failed to convert ROOT to Parquet: {e}"` |                                                                                                       
                                                                                                                                                                                
  At minimum these should include `request_id` and `file_path`.                                                                                                                 
                                                                                                                                                                                
  ### 3. `place` absent from some logs that include other extras                                                                                                                
                                                   
  | File | Line | Message |                                                                                                                                                     
  |---|---|---|                                    
  | `transformer.py` | 324–327 | `"Converting ROOT to Parquet"` — has `request_id` and `source_path`, missing `place` |
  | `transformer.py` | 395–403 | `"Error uploading file to object store"` — has most fields, missing `place` |                                                                  
  | `object_store_manager.py` | 111, 118 | Error logs have no extras at all, including no `place` |
                                                                                                                                                                                
  ### 4. `science_container_command.py` — no extras on any log call                                                                                                             
                                                                                                                                                                                
  All 6 log calls in this file emit no `extras`. Since this module runs in the per-file                                                                                         
  transform loop, `request_id` and `file_id` should be threaded in, especially for the
  `error` at line 54 (`"problem in getting GeT"`).                                                                                                                              
                                                   
  ### 5. `object_store_manager.py` line 102 — `"requestId"` key contains a bucket name                                                                                          
   
  ```python                                                                                                                                                                     
  {"requestId": bucket, "object": result.object_name}
  ```     

  The value passed is a bucket name, not a request ID. This should be `"bucket": bucket`                                                                                        
  (or `request_id` if the bucket name encodes the request ID).


# Violations in App

| ID | Bad key | Correct key | Notes |
  |---|---|---|---|
  | V1 | `requestId` | `request_id` | camelCase instead of snake_case; most common violation |
  | V2 | `file-id` | `file_id` | hyphen-separated; comes from raw transformer payload key |
  | V3 | `s3-object-name` | `object_name` | hyphen-separated; comes from raw transformer payload key |
  | V4 | `elapsed-time` | `elapsed` | hyphen-separated and wrong base name |
  | V5 | `error-type` | `error_type` | hyphen-separated |
  | V6 | `_message` | `error` | underscore-prefixed; carries error message text |
  | V7 | `exception` | `error` | passes an exception object; schema expects `str` |
  | V8 | `paths` | _(drop)_ | list of all paths in a batch; no schema equivalent — use `num_files` for count |
  | V9 | `report_processed_time` | `elapsed` | wrong key name for a timing value |
  ---

  ## Violations by File

  ### `transformer_manager.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 75 | `requestId` | V1 |
  | 561 | `requestId` | V1 |
  | 574 | `requestId` | V1 |
  | 586 | `requestId` | V1 |
  | 599 | `requestId`, `exception` | V1, V7 |

  ---

  ### `resources/servicex_resource.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 96 | `requestId` | V1 |

  ---

  ### `resources/transformation/cancel.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 49 | `requestId` | V1 |
  | 53 | `requestId` | V1 |
  | 67 | `requestId` | V1 |

  ---

  ### `resources/transformation/delete.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 48 | `requestId` | V1 |
  | 53 | `requestId` | V1 |

  ---

  ### `resources/transformation/deployment.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 22 | `requestId` | V1 |

  ---

  ### `resources/transformation/get_one.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 45 | `requestId` | V1 |

  ---

  ### `resources/transformation/status.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 47 | `requestId` | V1 |
  | 76 | `requestId` | V1 |

  ---

  ### `resources/transformation/submit.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 181 | `requestId` | V1 |
  | 195 | `requestId` | V1 |
  | 220 | `requestId` | V1 |
  | 244 | `requestId` | V1 |
  | 293 | `requestId` | V1 |
  | 302 | `requestId` | V1 |
  | 314 | `requestId` | V1 |
  | 320 | `requestId` | V1 |

  ---

  ### `resources/internal/fileset_complete.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 56 | `elapsed-time` | V4 |

  ---

  ### `resources/internal/fileset_error.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 54–62 | `elapsed-time`, `error-type`, `_message` | V4, V5, V6 |
  | 65–73 | `elapsed-time`, `error-type`, `_message` | V4, V5, V6 |
  | 90–98 | `requestId`, `elapsed-time`, `error-type`, `_message` | V1, V4, V5, V6 |
  | 108–116 | `requestId`, `elapsed-time`, `error-type`, `_message` | V1, V4, V5, V6 |

  ---

  ### `resources/internal/transform_status.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 22 | `requestId` | V1 |
  | 34 | `requestId` | V1 |

  ---

  ### `resources/internal/transformer_file_complete.py`

  The `log_extra` dict built at lines 102–106 propagates bad keys into every downstream call site.

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 102–106 | `requestId`, `file-id`, `s3-object-name` _(in `log_extra`)_ | V1, V2, V3 |
  | 108 | via `log_extra` | V1, V2, V3 |
  | 117 | via `log_extra` | V1, V2, V3 |
  | 127 | via `log_extra` | V1, V2, V3 |
  | 143–152 | via `log_extra`, `report_processed_time` | V1, V2, V3, V9 |
  | 156 | via `log_extra` | V1, V2, V3 |
  | 181 | via `log_extra` | V1, V2, V3 |
  | 228 | `requestId` | V1 |

  > Fix `log_extra` at line 102–106 and all six downstream sites (108, 117, 127, 143–152, 156, 181) are fixed automatically.

  ---

  ### `lookup_result_processor.py`

  _No remaining violations_ — `task_id` is now in the schema.

  ---

  ### `celery/server_tasks.py`

  | Line | Bad key(s) | Violation(s) |
  |---|---|---|
  | 87–90 | `requestId`, `paths` | V1, V8 |

  > `task_id` and `num_files` on those same lines are now valid schema keys; only `requestId` (→ `request_id`) and `paths` (→ drop) remain.

  ---

  ### `resources/internal/data_lifecycle_ops.py`

  _No remaining violations_ — `deleted` is now in the schema.

  ---

  ### `resources/internal/add_file_to_dataset.py`

  _No remaining violations_ — `dataset_id` is now in the schema.

  ---

  ### `resources/internal/fileset_complete.py` (second log site)

  _No remaining violations at line 85_ — `dataset_id` is now in the schema.

  ---

  ## Summary Counts (post-schema update)

  | Violation | Occurrences (log statements affected) |
  |---|---|
  | V1 `requestId` → `request_id` | 33 |
  | V2 `file-id` → `file_id` | 6 (all via `log_extra` in `transformer_file_complete.py`) |
  | V3 `s3-object-name` → `object_name` | 6 (all via `log_extra` in `transformer_file_complete.py`) |
  | V4 `elapsed-time` → `elapsed` | 5 |
  | V5 `error-type` → `error_type` | 4 |
  | V6 `_message` → `error` | 4 |
  | V7 `exception` → `error` | 1 |
  | V8 `paths` → drop | 1 |
  | V9 `report_processed_time` → `elapsed` | 1 |

  > V1 dominates. Fixing the `requestId` → `request_id` rename globally (e.g. with a project-wide search-and-replace scoped to `extra=` dicts) resolves more than two-thirds of
  all remaining violations.


File	Line(s)	Violating key	Proposed key
`transformer.py`	144–153	`"file-id"`, `"result-destination"`, `"result-format"`, `"service-endpoint"`	`file_id`, `result_destination`, `result_format`,
`service_endpoint`
`transformer.py`	172–178	`"file-id"`, `"file-path"`	`file_id`, `file_path`
`transformer.py`	246	`"file-id"`, `"file-size"`, `"total-events"`	`file_id`, `file_size`, `total_events`
`transformer.py`	260	`"file-id"`, `"file-path"`	`file_id`, `file_path`
`transformer.py`	369–390	`"file-id"`, `"objectName"`	`file_id`, `object_name`
`transformer.py`	395–403	`"file-id"`, `"objectName"`	`file_id`, `object_name`
`servicex_adapter.py`	115–124	`"file-id"`, `"s3-object-name"`	`file_id`, `object_name`
`servicex_adapter.py`	126–129	`"requestId"`	`request_id`
`object_store_manager.py`	102–105	`"requestId"`, `"object"`	`request_id`, `object_name`

File	Line	Message
`object_store_manager.py`	111	`"S3Error"`
`object_store_manager.py`	118	`"Minio error"`
`transformer.py`	331	`"Expected one tree found {data.keys()}"`
`transformer.py`	347	`"Failed to convert ROOT to Parquet: {e}"`

Violation	Occurrences (log statements affected)
V1 `requestId` → `request_id`	33
V2 `file-id` → `file_id`	6 (all via `log_extra` in `transformer_file_complete.py`)
V3 `s3-object-name` → `object_name`	6 (all via `log_extra` in `transformer_file_complete.py`)
V4 `elapsed-time` → `elapsed`	5
V5 `error-type` → `error_type`	4
V6 `_message` → `error`	4
V7 `exception` → `error`	1
V8 `paths` → drop	1
V9 `report_processed_time` → `elapsed`	1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enforce consistent logging metadata #1339

Proposed Schema

Violations in Sidecar

1. Inconsistent key naming (kebab-case, camelCase, and snake_case all in use)

2. Error/exception logs with no `extras`

3. `place` absent from some logs that include other extras

4. `science_container_command.py` — no extras on any log call

5. `object_store_manager.py` line 102 — `"requestId"` key contains a bucket name

Violations in App

Violations by File

`transformer_manager.py`

`resources/servicex_resource.py`

`resources/transformation/cancel.py`

`resources/transformation/delete.py`

`resources/transformation/deployment.py`

`resources/transformation/get_one.py`

`resources/transformation/status.py`

`resources/transformation/submit.py`

`resources/internal/fileset_complete.py`

`resources/internal/fileset_error.py`

`resources/internal/transform_status.py`

`resources/internal/transformer_file_complete.py`

`lookup_result_processor.py`

`celery/server_tasks.py`

`resources/internal/data_lifecycle_ops.py`

`resources/internal/add_file_to_dataset.py`

`resources/internal/fileset_complete.py` (second log site)

Summary Counts (post-schema update)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Key	Type	When to include
`request_id`	str	Core context — whenever available
`dataset_id`	int	Core context — dataset record ID
`file_id`	int	Core context — file record ID
`place`	dict	Core context — `{"host":…, "site":…, "pod":…}`
`file_path`	str	File-related — input path being processed
`object_name`	str	File-related — output object name in object store
`elapsed`	float	Completion events — wall-clock seconds
`user`	float	Completion events — CPU user time
`sys`	float	Completion events — CPU system time
`iowait`	float	Completion events — I/O wait time
`files_remaining`	int	Completion events — remaining file counter
`files_completed`	int	Completion events — completed file counter
`files_failed`	int	Completion events — failed file counter
`error`	str	Error/exception logs — message or repr
`error_type`	str	Error/exception logs — error-category code
`log_body`	str	Error/exception logs — captured science container output
`task_id`	str	Queue context — Celery task name
`num_files`	int	Queue context — number of files in a batch
`metric`	dict	Status-update events — structured transformer-progress payload
`deleted`	list[str]	Lifecycle events — deletion summary list

File	Line	Message
`transformer.py`	324–327	`"Converting ROOT to Parquet"` — has `request_id` and `source_path`, missing `place`
`transformer.py`	395–403	`"Error uploading file to object store"` — has most fields, missing `place`
`object_store_manager.py`	111, 118	Error logs have no extras at all, including no `place`

ID	Bad key	Correct key	Notes
V1	`requestId`	`request_id`	camelCase instead of snake_case; most common violation
V2	`file-id`	`file_id`	hyphen-separated; comes from raw transformer payload key
V3	`s3-object-name`	`object_name`	hyphen-separated; comes from raw transformer payload key
V4	`elapsed-time`	`elapsed`	hyphen-separated and wrong base name
V5	`error-type`	`error_type`	hyphen-separated
V6	`_message`	`error`	underscore-prefixed; carries error message text
V7	`exception`	`error`	passes an exception object; schema expects `str`
V8	`paths`	(drop)	list of all paths in a batch; no schema equivalent — use `num_files` for count
V9	`report_processed_time`	`elapsed`	wrong key name for a timing value

Line	Bad key(s)	Violation(s)
54–62	`elapsed-time`, `error-type`, `_message`	V4, V5, V6
65–73	`elapsed-time`, `error-type`, `_message`	V4, V5, V6
90–98	`requestId`, `elapsed-time`, `error-type`, `_message`	V1, V4, V5, V6
108–116	`requestId`, `elapsed-time`, `error-type`, `_message`	V1, V4, V5, V6

Line	Bad key(s)	Violation(s)
102–106	`requestId`, `file-id`, `s3-object-name` (in `log_extra`)	V1, V2, V3
108	via `log_extra`	V1, V2, V3
117	via `log_extra`	V1, V2, V3
127	via `log_extra`	V1, V2, V3
143–152	via `log_extra`, `report_processed_time`	V1, V2, V3, V9
156	via `log_extra`	V1, V2, V3
181	via `log_extra`	V1, V2, V3
228	`requestId`	V1

Uh oh!

Enforce consistent logging metadata #1339

Description

Proposed Schema

Violations in Sidecar

1. Inconsistent key naming (kebab-case, camelCase, and snake_case all in use)

2. Error/exception logs with no extras

3. place absent from some logs that include other extras

4. science_container_command.py — no extras on any log call

5. object_store_manager.py line 102 — "requestId" key contains a bucket name

Violations in App

Violations by File

transformer_manager.py

resources/servicex_resource.py

resources/transformation/cancel.py

resources/transformation/delete.py

resources/transformation/deployment.py

resources/transformation/get_one.py

resources/transformation/status.py

resources/transformation/submit.py

resources/internal/fileset_complete.py

resources/internal/fileset_error.py

resources/internal/transform_status.py

resources/internal/transformer_file_complete.py

lookup_result_processor.py

celery/server_tasks.py

resources/internal/data_lifecycle_ops.py

resources/internal/add_file_to_dataset.py

resources/internal/fileset_complete.py (second log site)

Summary Counts (post-schema update)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Error/exception logs with no `extras`

3. `place` absent from some logs that include other extras

4. `science_container_command.py` — no extras on any log call

5. `object_store_manager.py` line 102 — `"requestId"` key contains a bucket name

`transformer_manager.py`

`resources/servicex_resource.py`

`resources/transformation/cancel.py`

`resources/transformation/delete.py`

`resources/transformation/deployment.py`

`resources/transformation/get_one.py`

`resources/transformation/status.py`

`resources/transformation/submit.py`

`resources/internal/fileset_complete.py`

`resources/internal/fileset_error.py`

`resources/internal/transform_status.py`

`resources/internal/transformer_file_complete.py`

`lookup_result_processor.py`

`celery/server_tasks.py`

`resources/internal/data_lifecycle_ops.py`

`resources/internal/add_file_to_dataset.py`

`resources/internal/fileset_complete.py` (second log site)