Skip to content

feat(crashtracking): capture unhandled exception with the crashtracker #5321

Merged
gyuheon0h merged 6 commits intomasterfrom
gyuheon0h/capture-non-signal-crash
Feb 10, 2026
Merged

feat(crashtracking): capture unhandled exception with the crashtracker #5321
gyuheon0h merged 6 commits intomasterfrom
gyuheon0h/capture-non-signal-crash

Conversation

@gyuheon0h
Copy link
Contributor

@gyuheon0h gyuheon0h commented Feb 5, 2026

What does this PR do?
This PR adds support for crash report collection and emission for ruby unhandled exceptions. We do this by hooking into at_exit and accessing the exception stack. We send the exception stack over from the Ruby side to the native code side, and use it to build a crash report. We also send a crash ping, mainly for parity.

Native stack collection planned to be implemented but is out of scope for this stage.

Motivation:
Nice to see non-signal based crashes (not captured by regular errortracking) and was a feature request from SSI team.

Ticket: PROF-13673
Change log entry
Yes. Crashtracking: unhandled exceptions are caught and reported by the crashtracker

Additional Notes:

How to test the change?
Unit tests

Run a test ruby program instrumented with the crashtracker and look at the report being sent.

{
  "data_schema_version": "1.4",
  "error": {
    "is_crash": true,
    "kind": "UnhandledException",
    "message": "Unhandled ArgumentError: Test argument crash",
    "source_type": "Crashtracking",
    "stack": {
      "format": "Datadog Crashtracker 1.0",
      "frames": [
        {
          "file": "/home/bits/go/src/github.com/DataDog/dd-trace-rb/spec/datadog/core/crashtracking/component_spec.rb",
          "function": "block (4 levels) in <top (required)>",
          "line": 161
        },
        {
          "file": "/home/bits/go/src/github.com/DataDog/dd-trace-rb/spec/datadog/core/crashtracking/component_spec.rb",
          "function": "block (6 levels) in <top (required)>",
          "line": 168
        },
        ...
        {
          "file": "/var/lib/gems/3.0.0/gems/rspec-core-3.13.6/lib/rspec/core/runner.rb",
          "function": "invoke",
          "line": 45
        },
        {
          "file": "/var/lib/gems/3.0.0/gems/rspec-core-3.13.6/exe/rspec",
          "function": "<top (required)>",
          "line": 4
        },
        {
          "file": "/usr/local/bin/rspec",
          "function": "load",
          "line": 25
        },
        {
          "file": "/usr/local/bin/rspec",
          "function": "<main>",
          "line": 25
        }
      ],
      "incomplete": false
    }
  },
  "incomplete": false,
  "metadata": {
    "library_name": "dd-trace-rb",
    "library_version": "2.29.0",
    "family": "ruby",
    "tags": [
      "tag1:value1",
      "tag2:value2",
      "language:ruby-testing-123",
      "service:ruby-testing-123"
    ]
  },
  "os_info": {
    "architecture": "x86_64",
    "bitness": "64-bit",
    "os_type": "Ubuntu",
    "version": "22.4.0"
  },
  "proc_info": {
    "pid": 220117
  },
  "timestamp": "2026-02-06 00:25:31.590807434 UTC",
  "uuid": "9082567b-686a-4897-95cb-e596c929ba78"
}

@gyuheon0h gyuheon0h marked this pull request as ready for review February 5, 2026 20:52
@gyuheon0h gyuheon0h requested review from a team as code owners February 5, 2026 20:52
@gyuheon0h gyuheon0h marked this pull request as draft February 5, 2026 20:52
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Thank you for updating Change log entry section 👏

Visited at: 2026-02-10 01:22:21 UTC

@github-actions github-actions bot added the core Involves Datadog core libraries label Feb 5, 2026
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch 2 times, most recently from c5d3fce to e4b1623 Compare February 5, 2026 21:35
@datadog-official
Copy link

datadog-official bot commented Feb 6, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 87.37%
Overall Coverage: 95.14% (-0.04%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ba2fa9e | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Feb 6, 2026

Benchmarks

Benchmark execution time: 2026-02-10 14:30:49

Comparing candidate commit ba2fa9e in PR branch gyuheon0h/capture-non-signal-crash with baseline commit 7631952 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

@gyuheon0h gyuheon0h marked this pull request as ready for review February 6, 2026 01:05
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it a pass!

@gyuheon0h gyuheon0h requested a review from ivoanjo February 6, 2026 19:16
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from 6f5fc9b to 25077d0 Compare February 6, 2026 19:50
Copy link
Member

@p-datadog p-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the C code and while nothing jumped out at me I also don't know if everything there is correct.

I left comments for the Ruby code.

In general, since we do have a crash tracker for crashes, I would like to see "unhandled exceptions" (and more precisely, "unhandled exceptions on main thread") NOT be referred to as "crashes" in Ruby code or documentation. I understand that eventually the libdatadog data structures will be created that have "crash" in their name, but I would prefer to see everything upstream of that use correct terminology and refer to "unhandled exceptions".

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from 87fad47 to 808b3f6 Compare February 6, 2026 22:16
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another pass!

@ivoanjo
Copy link
Member

ivoanjo commented Feb 9, 2026

In general, since we do have a crash tracker for crashes, I would like to see "unhandled exceptions" (and more precisely, "unhandled exceptions on main thread") NOT be referred to as "crashes" in Ruby code or documentation. I understand that eventually the libdatadog data structures will be created that have "crash" in their name, but I would prefer to see everything upstream of that use correct terminology and refer to "unhandled exceptions".

+1 on this -- I suggest updating the PR title and changelog entry as well.

@gyuheon0h gyuheon0h changed the title feat(crashtracking): capture non signal based crashes feat(crashtracking): capture unhandled exception based crashes Feb 9, 2026
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method, and clears 1 partially typed method. It increases the percentage of typed methods from 59.87% to 59.96% (+0.09%).

Partially typed methods (+1-1)Introduced:
sig/datadog/core/crashtracking/component.rbs:15
└── def initialize: (
          tags: ::Hash[::String, ::String],
          agent_base_url: ::String,
          ld_library_path: ::String,
          path_to_crashtracking_receiver_binary: ::String,
          logger: untyped
        ) -> void
Cleared:
sig/datadog/core/crashtracking/component.rbs:11
└── def initialize: (
          tags: ::Hash[::String, ::String],
          agent_base_url: ::String,
          ld_library_path: ::String,
          path_to_crashtracking_receiver_binary: ::String,
          logger: untyped
        ) -> void

Untyped other declarations

This PR introduces 1 untyped other declaration, and clears 1 untyped other declaration.

Untyped other declarations (+1-1)Introduced:
sig/datadog/core/crashtracking/component.rbs:35
└── attr_reader logger: untyped
Cleared:
sig/datadog/core/crashtracking/component.rbs:31
└── attr_reader logger: untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

@gyuheon0h gyuheon0h changed the title feat(crashtracking): capture unhandled exception based crashes feat(crashtracking): capture unhandled exception with the crashtracker Feb 9, 2026
@gyuheon0h gyuheon0h requested a review from ivoanjo February 9, 2026 15:57
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from ea94407 to 3fabed1 Compare February 9, 2026 18:33
Signal based crash report (crash done, need to do ping)

Revert "Gitignore weird files that keep popping up (will pop this commit later)"

This reverts commit aeb3017.

Revert "Remove VS Code config files from tracking"

This reverts commit 2b30b86.

Use locations array

Clean

Lazy logging

Fix memory leak
Fmt

fmt

Do work on ruby side, fix sus calls

Remove noisy log

Update symbol name

Check result, build message in ruby

unit test and test cleanup

Inline + no order dependency + cleanup

Number of frames logic on ruby side

frame processing in helper

Restore accidentally deleted comment

Update tags on fork

Fmt

Fix potential mem leak

move to core

clean

Extract into helper

Fix more potential leaks

Fmt

Remove comment from Ruby exception crash reporting context

Removed comment about Ruby exception crash reporting tests.

Respond to oleg -(rescuing all exceptions)

Flip negation

No more do-while, crash vs exception naming, test sleep fix, minor refactoring

Tag builder helper func, move all logic into ct component, move builder into build function
Trigger CI

rbs file

Trigger CI

CI debug

We need to explicitly check, not depend on order

Be explicit with typing

Trigger CI

Incomplete stack

Clarity in tests
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from 9050813 to c0aec88 Compare February 9, 2026 20:37
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM, I like this latest iteration!

@gyuheon0h gyuheon0h merged commit 40f46ee into master Feb 10, 2026
1459 of 1466 checks passed
@gyuheon0h gyuheon0h deleted the gyuheon0h/capture-non-signal-crash branch February 10, 2026 14:57
@github-actions github-actions bot added this to the 2.29.0 milestone Feb 10, 2026
@@ -0,0 +1,205 @@
#include <datadog/common.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it's crashing, but we could have call ddog_Error_drop on the error. (cleaner)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, right, we are definitely leaking the error, if it ever gets triggered -- very worth cleaning it up. Can you look into it @gyuheon0h ?

(Again, this API is sooooooooo awkward and I'm looking forward to actually having libdatadog handle things in a much nicer way instead of dropping all this complexity on the Ruby side)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants