Skip to content

Garbage pointer dereference when using OTLP GRPC exporter with inacessible endpoint #3215

@tom-neara

Description

@tom-neara

Describe your environment

x86_64, Ubuntu-22.04, bazel 7.40, opentelemetry-cpp 1.16.0 and 1.18.0

Steps to reproduce
Lazily setup otel like so (my setup had a lot more indirection, this code doesn't actually crash for me) and run with OTEL_EXPORTER_OTLP_ENDPOINT=127.0.0.1:1337 or some other address that isn't a valid endpoint.

In my actual environment where it reproduces, I'm launching python then calling into a pybind11 module which I built which calls otel_tracer() above and creates a few spans. The crash occurs in the grpc exporter after main returns, see below.

#include <iostream>
#include <opentelemetry/context/propagation/text_map_propagator.h>
#include <opentelemetry/exporters/memory/in_memory_span_exporter.h>
#include <opentelemetry/exporters/otlp/otlp_grpc_exporter_factory.h>
#include <opentelemetry/sdk/trace/batch_span_processor_factory.h>
#include <opentelemetry/sdk/trace/processor.h>
#include <opentelemetry/sdk/trace/simple_processor.h>
#include <opentelemetry/sdk/trace/simple_processor_factory.h>
#include <opentelemetry/sdk/trace/tracer_provider_factory.h>
#include <opentelemetry/trace/context.h>
#include <opentelemetry/trace/propagation/http_trace_context.h>
#include <opentelemetry/trace/provider.h>
#include <opentelemetry/trace/span_startoptions.h>
#include <opentelemetry/trace/tracer.h>
#include <opentelemetry/trace/tracer_provider.h>

namespace otel = opentelemetry;
namespace trace_sdk = opentelemetry::sdk::trace;

static std::shared_ptr<otel::trace::TracerProvider> api_provider;

void setup_opentelemetry()
{
    auto exporter = otel::exporter::otlp::OtlpGrpcExporterFactory::Create();
    auto processor =
        trace_sdk::BatchSpanProcessorFactory::Create(std::move(exporter), {});
    auto resource = otel::sdk::resource::Resource(
        otel::sdk::resource::Resource::Create({{"service.name", "test"}}));
    auto provider = trace_sdk::TracerProviderFactory::Create(
        std::move(processor), resource);
    api_provider = std::move(provider);
    otel::trace::Provider::SetTracerProvider(api_provider);
}
std::shared_ptr<otel::trace::Tracer> otel_tracer()
{
    if (!api_provider)
    {
        setup_opentelemetry();
    }
    return api_provider->GetTracer("test", "1.0.0");
}

int
main(void)
{
    auto tracer = otel_tracer();

    {
      auto span = tracer->StartSpan("main");
      auto scope = tracer->WithActiveSpan(span);
      span->SetAttribute("key", "value");
    }
    return 0;
}

What is the expected behavior?
Program exits normally

What is the actual behavior?

Crash in https://github.com/open-telemetry/opentelemetry-cpp/blob/main/sdk/include/opentelemetry/sdk/common/global_log_handler.h#L162

Additional context

gdb session:

Thread 714 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff9bfe00640 (LWP 4009503)]
0x00007fff4b8c0d3f in opentelemetry::v1::exporter::otlp::OtlpGrpcExporter::Export(std::__1::span<std::__1::unique_ptr<opentelemetry::v1::sdk::trace::Recordable, std::__1::default_delete<opentelemetry::v1::sdk::trace::Recordable> >, 18446744073709551615ul> const&) () from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
(gdb) bt
#0  0x00007fff4b8c0d3f in opentelemetry::v1::exporter::otlp::OtlpGrpcExporter::Export(std::__1::span<std::__1::unique_ptr<opentelemetry::v1::sdk::trace::Recordable, std::__1::default_delete<opentelemetry::v1::sdk::trace::Recordable> >, 18446744073709551615ul> const&) () from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#1  0x00007fff4c835189 in opentelemetry::v1::sdk::trace::BatchSpanProcessor::Export() ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#2  0x00007fff4c835003 in opentelemetry::v1::sdk::trace::BatchSpanProcessor::DrainQueue() ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#3  0x00007fff4c834769 in opentelemetry::v1::sdk::trace::BatchSpanProcessor::DoBackgroundWork() ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#4  0x00007fff4c839439 in decltype (((*(std::declval<opentelemetry::v1::sdk::trace::BatchSpanProcessor*>())).*(std::declval<void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*)()>()))()) std::__1::__invoke[abi:v160004]<void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*)(), opentelemetry::v1::sdk::trace::BatchSpanProcessor*, , void>(void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*&&)(), opentelemetry::v1::sdk::trace::BatchSpanProcessor*&&) ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#5  0x00007fff4c8393be in void std::__1::__thread_execute[abi:v160004]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*)(), opentelemetry::v1::sdk::trace::BatchSpanProcessor*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*)(), opentelemetry::v1::sdk::trace::BatchSpanProcessor*>&, std::__1::__tuple_indices<2ul>) ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#6  0x00007fff4c839192 in void* std::__1::__thread_proxy[abi:v160004]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (opentelemetry::v1::sdk::trace::BatchSpanProcessor::*)(), opentelemetry::v1::sdk::trace::BatchSpanProcessor*> >(void*) ()
   from /home/tbirch/.cache/bazel/_bazel_tbirch/5dac0c5eac9fcd76e48829b4e488238f/execroot/_main/bazel-out/k8-fastbuild/bin/innerloop/ml/neara_ai/classify.runfiles/_main/innerloop/ml/neara_ai/classifier_module.so
#7  0x00007ffff6694ac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#8  0x00007ffff6726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) disassemble
Dump of assembler code for function _ZN13opentelemetry2v18exporter4otlp16OtlpGrpcExporter6ExportERKNSt3__14spanINS4_10unique_ptrINS0_3sdk5trace10RecordableENS4_14default_deleteIS9_EEEELm18446744073709551615EEE:
<...>
   0x00007fff4b8c0cf9 <+1705>:	call   0x7fff4ac3f240 <_ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5c_strB7v160004Ev>
   0x00007fff4b8c0cfe <+1710>:	mov    %rax,-0x5d0(%rbp)
   0x00007fff4b8c0d05 <+1717>:	lea    -0x520(%rbp),%rdi
   0x00007fff4b8c0d0c <+1724>:	call   0x7fff4b22a720 <_ZN13opentelemetry2v13sdk6common12AttributeMapC2Ev>
   0x00007fff4b8c0d11 <+1729>:	jmp    0x7fff4b8c0d16 <_ZN13opentelemetry2v18exporter4otlp16OtlpGrpcExporter6ExportERKNSt3__14spanINS4_10unique_ptrINS0_3sdk5trace10RecordableENS4_14default_deleteIS9_EEEELm18446744073709551615EEE+1734>
   0x00007fff4b8c0d16 <+1734>:	mov    -0x5d0(%rbp),%r8
   0x00007fff4b8c0d1d <+1741>:	mov    -0x5c8(%rbp),%rdi
   0x00007fff4b8c0d24 <+1748>:	mov    (%rdi),%rax
   0x00007fff4b8c0d27 <+1751>:	mov    $0x1,%esi
   0x00007fff4b8c0d2c <+1756>:	lea    -0x19f0b56(%rip),%rdx        # 0x7fff49ed01dd
   0x00007fff4b8c0d33 <+1763>:	mov    $0xa0,%ecx
   0x00007fff4b8c0d38 <+1768>:	lea    -0x520(%rbp),%r9
=> 0x00007fff4b8c0d3f <+1775>:	call   *0x10(%rax)
   0x00007fff4b8c0d42 <+1778>:	lea    -0x520(%rbp),%rdi
   0x00007fff4b8c0d49 <+1785>:	call   0x7fff4b20c060 <_ZN13opentelemetry2v13sdk6common12AttributeMapD2Ev>
   0x00007fff4b8c0d4e <+1790>:	lea    -0x4f0(%rbp),%rdi
   0x00007fff4b8c0d55 <+1797>:	call   0x7fff4acedfd0 <_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED2Ev>
   0x00007fff4b8c0d5a <+1802>:	lea    -0x4c0(%rbp),%rdi
   0x00007fff4b8c0d61 <+1809>:	call   0x7fff4ad05390 <_ZNSt3__118basic_stringstreamIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev>
   0x00007fff4b8c0d66 <+1814>:	movl   $0x0,-0x3c(%rbp)
   0x00007fff4b8c0d6d <+1821>:	lea    -0x3a8(%rbp),%rdi
   0x00007fff4b8c0d74 <+1828>:	call   0x7fff4b2312d0 <_ZNSt3__110shared_ptrIN13opentelemetry2v13sdk6common12internal_log10LogHandlerEED2B7v160004Ev>
   0x00007fff4b8c0d79 <+1833>:	mov    -0x3c(%rbp),%eax
   0x00007fff4b8c0d7c <+1836>:	test   %eax,%eax
(gdb) info registers
rax            0x7fec7e568f88      140653708611464
rbx            0x7ff9bfe00640      140710642714176
rcx            0xa0                160
rdx            0x7fff49ed01dd      140734433657309
rsi            0x1                 1
rdi            0x7feb80ee82c0      140649457156800
rbp            0x7ff9bfdffc40      0x7ff9bfdffc40
rsp            0x7ff9bfdff670      0x7ff9bfdff670
r8             0x7febc8002030      140650649493552
r9             0x7ff9bfdff720      140710642710304
r10            0x7ff9bfdfe6f0      140710642706160
r11            0x2eef90b907f2fa4   211380060597202852
r12            0x7ff9bfe00640      140710642714176
r13            0x0                 0
r14            0x7ffff66947d0      140737327482832
r15            0x7ffa13dfdb90      140712051989392
rip            0x7fff4b8c0d3f      0x7fff4b8c0d3f <opentelemetry::v1::exporter::otlp::OtlpGrpcExporter::Export(std::__1::span<std::__1::unique_ptr<opentelemetry::v1::sdk::trace::Recordable, std::__1::default_delete<opentelemetry::v1::sdk::trace::Recordable> >, 18446744073709551615ul> const&)+1775>
eflags         0x10206             [ PF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) x/20x 0x7fec7e568f88
0x7fec7e568f88:	0x00000000	0x00000000	0x00000000	0x00000000
0x7fec7e568f98:	0x00000000	0x00000000	0x00000000	0x00000000
0x7fec7e568fa8:	0x00000000	0x00000000	0x00000000	0x00000000
0x7fec7e568fb8:	0x00000000	0x00000000	0x00000000	0x00000000
0x7fec7e568fc8:	0x00000000	0x00000000	0x00000000	0x00000000

Applying this patch prevents the destruction of the global handler and thus prevents it from crashing

diff --git sdk/src/common/global_log_handler.cc sdk/src/common/global_log_handler.cc
index f60527ee..66620793 100644
--- sdk/src/common/global_log_handler.cc
+++ sdk/src/common/global_log_handler.cc
@@ -59,9 +59,8 @@ void NoopLogHandler::Handle(LogLevel,
.
 std::pair<nostd::shared_ptr<LogHandler>, LogLevel> &GlobalLogHandler::GetHandlerAndLevel() noexcept
 {
-  static std::pair<nostd::shared_ptr<LogHandler>, LogLevel> handler_and_level{
-      nostd::shared_ptr<LogHandler>(new DefaultLogHandler), LogLevel::Warning};
-  return handler_and_level;
+  static std::pair<nostd::shared_ptr<LogHandler>, LogLevel> *handler_and_level{new std::pair<nostd::shared_ptr<LogHandler>, LogLevel>{nostd::shared_ptr<LogHandler>(new DefaultLogHandler), LogLevel::Warning}};
+  return *handler_and_level;
 }
.
 }  // namespace internal_log

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions