Skip to content

invalid pthread_t 0x<sanitized> passed to pthread_kill coming from startProfiling #5441

@coolsoftwaretyler

Description

@coolsoftwaretyler

What React Native libraries do you use?

Hermes, RN New Architecture, React Navigation, Expo Application Services (EAS), Expo (mobile only)

Are you using sentry.io or on-premise?

sentry.io (SaS)

@sentry/react-native SDK Version

7.7.0

How does your development environment look like?

We're in a pnpm monorepo with Expo so I don't think it'll get much useful info. Here's what I get from that:

⚠️ react-native depends on @react-native-community/cli for cli commands. To fix update your package.json to include:


  "devDependencies": {
    "@react-native-community/cli": "latest",
  }

Sentry.init()

Sentry.init({
    environment: <we get this dynamically with a function call>
    release: <we get this dynamically with a function call>,
    dist,
    dsn: sentryEnabled ? 'our-dsn' : undefined,

    // Enable this only if you're testing Sentry changes in development
    debug: false,
    tracesSampleRate: 0.1,
    profilesSampleRate: 0.1,

    replaysOnErrorSampleRate: enableSessionReplay ? 0.05 : 0,
    replaysSessionSampleRate: 0,

    integrations: [
      navigationInstrumentation, // Capture navigation breadcrumbs & performance spans
      sentryMobileReplayIntegration({
        maskAllText: true,
        maskAllImages: false,
        maskAllVectors: false,
      }),
    ],
    initialScope: {
      tags: (() => {
        const baseTags: Record<string, string | undefined> = {...someTagsHere};
        return baseTags;
      })(),
    },
});

Steps to Reproduce

I did my best to reproduce this, but it seems to be an intermittent issue in production. I think the root cause may be threading issues, which is hard to artificially force.

Here's what I know:

Google Play Console is reporting a crash with this error:

invalid pthread_t 0x<sanitized> passed to pthread_kill

Here's the stack trace in Google Play:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 5131 >>> com.my.app <<<

backtrace:
  #00  pc 0x000000000007123c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #01  pc 0x0000000000082c28  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_find(long, char const*)+196)
  #02  pc 0x0000000000082b44  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_gettid(long, char const*)+12)
  #03  pc 0x0000000000083948  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_kill+52)
  #04  pc 0x00000000001105e0  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #05  pc 0x0000000000110a60  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #06  pc 0x000000000011095c  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #07  pc 0x0000000000110d78  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #08  pc 0x0000000000111480  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #09  pc 0x0000000000082600  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+184)
  #10  pc 0x0000000000074a58  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

And here's that same trace symbolicated with Hermes:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 15474 >>> com.my.app <<<

backtrace:
  #00  pc 0x000000000007137c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #01  pc 0x0000000000082d68  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_find(long, char const*)+196)
  #02  pc 0x0000000000082c84  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_gettid(long, char const*)+12)
  #03  pc 0x0000000000083a88  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_kill+52)
  #04  pc 0x00000000001105e0  libhermes.so
       hermes::vm::sampling_profiler::Sampler::platformSuspendVMAndWalkStack(hermes::vm::SamplingProfiler*)
       SamplingProfilerPosix.cpp:321
  #05  pc 0x0000000000110a60  libhermes.so
       hermes::vm::sampling_profiler::Sampler::sampleStack(hermes::vm::SamplingProfiler*)
       SamplingProfilerSampler.cpp:99
  #06  pc 0x000000000011095c  libhermes.so
       hermes::vm::sampling_profiler::Sampler::sampleStacks()
       SamplingProfilerSampler.cpp:62
  #07  pc 0x0000000000110d78  libhermes.so
       hermes::vm::sampling_profiler::Sampler::timerLoop(double)
       SamplingProfilerSampler.cpp:162
  #08  pc 0x0000000000111480  libhermes.so
       std::__ndk1::__thread_proxy<...>(void*)
       __thread/thread.h:199
  #09  pc 0x0000000000082740  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+184)
  #10  pc 0x0000000000074b98  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

---
Analysis: Crash in Hermes Sampling Profiler

The sampling profiler's background thread called pthread_kill() to send a signal 
to a JavaScript thread for stack sampling, but the target thread no longer exists 
(or has an invalid thread ID). This caused __pthread_internal_find() to fail and 
call abort().

This is a race condition: the profiler's timer loop tried to sample a thread that 
was terminated between when the profiler registered it and when the sample was attempted.

Hermes version: 0.79.6 (React Native 0.79.6)
Build ID: b06c77a49801680608345e3c05bd59aba90e9f19

I brought this up with the Hermes maintainers. This issue can happen when something calls the Hermes profiler, and if the relevant thread can't be addressed. A long time ago, React Native Reanimated had this issue in some instances. And I found that @sentry/react-native seems to be calling methods that would trigger this profiling as well.

startProfiling calls HermesSamplingProfiler.enable, which in turn calls hermesAPI->enableSamplingProfiler():

public WritableMap startProfiling(boolean platformProfilers) {
    final WritableMap result = new WritableNativeMap();
    if (androidProfiler == null && platformProfilers) {
      initializeAndroidProfiler();
    }

    try {
      HermesSamplingProfiler.enable();
      if (androidProfiler != null) {
        androidProfiler.start();
      }

      result.putBoolean("started", true);
    } catch (Throwable e) { // NOPMD - We don't want to crash in any case
      result.putBoolean("started", false);
      result.putString("error", e.toString());
    }
    return result;
  }

public WritableMap startProfiling(boolean platformProfilers) {
final WritableMap result = new WritableNativeMap();
if (androidProfiler == null && platformProfilers) {
initializeAndroidProfiler();
}
try {
HermesSamplingProfiler.enable();
if (androidProfiler != null) {
androidProfiler.start();
}
result.putBoolean("started", true);
} catch (Throwable e) { // NOPMD - We don't want to crash in any case
result.putBoolean("started", false);
result.putString("error", e.toString());
}
return result;
}
(I found this code is the same 7.7.0 tag as well)

Which calls:

void HermesSamplingProfiler::enable(jni::alias_ref<jclass> /*unused*/) {
  auto* hermesAPI =
      castInterface<hermes::IHermesRootAPI>(hermes::makeHermesRootAPI());
  hermesAPI->enableSamplingProfiler();
}

The Hermes team considers this to be atypical.

Again, I don't have a formal reproducer (I believe we'd need to line up exactly the right conditions: get traces started on threads that become unaddressable - I can't figure out the code needed to reproduce). But I'm hoping this is enough information for you all to track it down. If you've got ideas of what might reproduce this, that would be helpful to me (and I'd be happy to put together a clearer reproduction).

I'm hoping the solution is as simple as adding some kind of error handling in your startProfiling method so that if we hit this pthread error, the app does not crash. Right now we seem to be getting SIGABRT errors. If we can just catch and handle those, I think that would fix my issue. Again, without a reproducer I can't quite reason out what needs to change, but I'm hoping y'all would have a good idea.

Expected Result

App should never crash when Sentry profiling is run. Right now this happens fairly infrequently (about 1-3% of users, maybe?) - but I think this should be 0%.

Actual Result

Seeing crashes in Google Play Console.

Metadata

Metadata

Assignees

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions