Skip to content

Running spans aren't ended when process terminates #4813

@Helveg

Description

@Helveg

Describe your environment

OS; Ubuntu
Python version: 3.12.0
SDK version: 1.38.0
API version: 1.38.0

What happened?

When an exception interrupts a running stack of with tracer.start_as_current_span(...):, all the current spans correctly finalize and are ended, but when a signal terminates the process, this isn't the case:

    with tracer.start_as_current_span("normal_run_example") as span:
        span.set_attribute("example.attribute", "normal_execution")
        raise Exception("Normal Exception")

this records the exception, ends the span, and shuts down the trace provider and makes sure everything is exported before shutdown. this does not:

    with tracer.start_as_current_span("normal_run_example") as span:
        span.set_attribute("example.attribute", "normal_execution")
        time.sleep(0.2)
        with tracer.start_as_current_span("nested_run_example") as span2:
            time.sleep(0.5)
            os.kill(os.getpid(), signal.SIGTERM)
            time.sleep(10)

Steps to Reproduce

see examples above

Expected Result

Even when the process receives signals, the spans should be ended and the provider shut down, similar to how precautions are taken by using atexit handlers.

Actual Result

Running trace does not arrive at the collector.

Additional context

Currently the token needed to restore the context is only accessible inside opentelemetry.trace.use_span, but if we were to attach that as a private attribute to the created spans then from a signal handler we could do the following:

def shutdown_otel(signum=None):
    # Gracefully end the current span hierarchy if it is still running
    curr = trace.get_current_span()
    while curr and curr.is_recording():
        curr.end()
        token = getattr(curr, "_ctx_token", None) # <-- attribute set in `use_span`
        if token:
            detach(token)
        curr = trace.get_current_span()

    # Gracefully shutdown the trace provider
    try:
        provider.shutdown()
    except Exception:
        pass

    # Reraise the previous signal handler
    if signum is not None:
        signal.signal(signum, prev_handlers[signum])
        signal.raise_signal(signum)


# Run shutdown on interceptable termination signals
prev_handlers = {}
for s in ("SIGINT", "SIGTERM", "SIGHUP"):
    sig = getattr(signal, s, None)
    if sig is None:
        continue
    prev_handlers[sig] = signal.signal(sig, lambda signum, frame, _s=sig: shutdown_otel(_s))

Would you like to implement a fix?

Yes

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions