Skip to content

Commit 4aca390

Browse files
nammnanandsyncs
andauthored
agent release - add traces (#226)
# Summary We sometimes run into errors and having these traced makes it easier to know which ones have failed for later analysis. This pull request enhances the `queue_exception_handling` function in `pipeline.py` by integrating OpenTelemetry tracing and adding detailed metrics to improve observability of task queue error processing. ### Observability Enhancements: * Added OpenTelemetry tracing to `queue_exception_handling` * Introduced metrics to track task queue processing, including: - Total number of tasks (`mck.agent.queue.tasks_total`). - Count of tasks with exceptions (`mck.agent.queue.exceptions_count`). - Success rate of tasks (`mck.agent.queue.success_rate`). - Types of exceptions encountered (`mck.agent.queue.exception_types`). - Boolean flag indicating if exceptions were found (`mck.agent.queue.has_exceptions`) * Updated logging to provide more granular details about exceptions encountered in the task queue ## Proof of Work <!-- Enter your proof that it works here.--> ## Checklist - [ ] Have you linked a jira ticket and/or is the ticket in the title? - [ ] Have you checked whether your jira ticket required DOCSP changes? - [ ] Have you checked for release_note changes? ## Reminder (Please remove this when merging) - Please try to Approve or Reject Changes the PR, keep PRs in review as short as possible - Our Short Guide for PRs: [Link](https://docs.google.com/document/d/1T93KUtdvONq43vfTfUt8l92uo4e4SEEvFbIEKOxGr44/edit?tab=t.0) - Remember the following Communication Standards - use comment prefixes for clarity: * **blocking**: Must be addressed before approval. * **follow-up**: Can be addressed in a later PR or ticket. * **q**: Clarifying question. * **nit**: Non-blocking suggestions. * **note**: Side-note, non-actionable. Example: Praise * --> no prefix is considered a question --------- Co-authored-by: Anand <[email protected]> Co-authored-by: Anand Singh <[email protected]>
1 parent 6c9d6c9 commit 4aca390

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

pipeline.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1352,12 +1352,31 @@ def build_agent_on_agent_bump(build_configuration: BuildConfiguration):
13521352
queue_exception_handling(tasks_queue)
13531353

13541354

1355+
@TRACER.start_as_current_span("queue_exception_handling")
13551356
def queue_exception_handling(tasks_queue):
1357+
span = trace.get_current_span()
1358+
13561359
exceptions_found = False
1360+
exception_count = 0
1361+
total_tasks = len(tasks_queue.queue)
1362+
exception_types = set()
1363+
1364+
span.set_attribute("mck.agent.queue.tasks_total", total_tasks)
1365+
13571366
for task in tasks_queue.queue:
13581367
if task.exception() is not None:
13591368
exceptions_found = True
1369+
exception_count += 1
1370+
exception_types.add(type(task.exception()).__name__)
13601371
logger.fatal(f"The following exception has been found when building: {task.exception()}")
1372+
1373+
span.set_attribute("mck.agent.queue.exceptions_count", exception_count)
1374+
span.set_attribute(
1375+
"mck.agent.queue.success_rate", ((total_tasks - exception_count) / total_tasks * 100) if total_tasks > 0 else 0
1376+
)
1377+
span.set_attribute("mck.agent.queue.exception_types", list(exception_types))
1378+
span.set_attribute("mck.agent.queue.has_exceptions", exceptions_found)
1379+
13611380
if exceptions_found:
13621381
raise Exception(
13631382
f"Exception(s) found when processing Agent images. \nSee also previous logs for more info\nFailing the build"

0 commit comments

Comments
 (0)