feat(prof): improve time sample accuracy by gathering during allocation samples #3559

morrisonlevi · 2026-01-07T19:00:37Z

Description

Gathers time samples during allocation samples if interrupt_count > 0. This has two benefits:

A slight accuracy improvement.
A slight performance improvement in latency and CPU.

The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.

The performance benefit mainly* comes from combining the two stack walks into one. It is dampened a bit by incurring an extra atomic load, we might revisit the Ordering used in the future. When I say a slight performance benefit, here's a comparison to v1.16.0:

library_version	latency_p99	cpu%	memory
local	114.38 ms	127.8%	121.95 MB
1.16.0	114.35 ms	128.5%	120.19 MB

* I was going to look at the profiles and post screenshots but for whatever reason I cannot see any frames related to memory allocations in the full host profiler. I may edit this later to show flamegraphs when I can get them.

Reviewer checklist

Test coverage seems ok.
Appropriate labels assigned.

When allocation profiler takes a sample and interrupt_count > 0, collect both allocation and time samples in a single stack walk. Benefits: - Eliminates redundant stack walks when samples coincide. - Improves time sample accuracy (closer to timer expiry).

Let's do this in a dedicated PR.

profiling/src/allocation/mod.rs

profiling/src/profiling/uploader.rs

codecov-commenter · 2026-01-07T19:03:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.91%. Comparing base (bb3cb7c) to head (1e42042).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3559      +/-   ##
==========================================
- Coverage   62.01%   61.91%   -0.11%     
==========================================
  Files         140      140              
  Lines       13311    13311              
  Branches     1762     1762              
==========================================
- Hits         8255     8241      -14     
- Misses       4267     4280      +13     
- Partials      789      790       +1

see 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb3cb7c...1e42042. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pr-commenter · 2026-01-07T19:08:00Z

Benchmarks [ profiler ]

Benchmark execution time: 2026-01-21 15:34:11

Comparing candidate commit 1e42042 in PR branch levi/perf-time-alloc-piggyback with baseline commit bb3cb7c in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 9 unstable metrics.

profiling/tests/correctness/allocation_time_combined.php

realFlowControl

I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?

realFlowControl · 2026-01-08T07:49:51Z

profiling/tests/correctness/allocation_time_combined.php

+        $s = str_repeat("x", 10_000_000);
+        strlen($s); // Use the string to prevent optimization


This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.

profiling/tests/correctness/allocation_time_combined.json

profiling/src/profiling/mod.rs

profiling/src/allocation/mod.rs

.github/workflows/prof_correctness.yml

realFlowControl · 2026-01-08T10:20:35Z

I've created an upstream PR to fix the empty/malformed expected profile JSON [...]

This got merged and I re-run the correctness tests, which are now failing.

profiling/tests/correctness/allocation_time_combined.json

datadog-datadog-prod-us1 · 2026-01-12T22:05:19Z

⚠️ Tests

✨ Fix all issues with Cursor

⚠️ Warnings

❄️ 1 New flaky test detected

testScenario with data set "A simple GET request returning a string" from

slim-312-test.DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest.DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest::testScenario

(Datadog) (Fix with Cursor)

DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest::testScenario with data set "A simple GET request returning a string" (DDTrace\Tests\Frameworks\Util\Request\GetSpec Object (...), array(DDTrace\Tests\Common\SpanAssertion Object (...)))
Failed web request to 'http://localhost/simple?key=value&pwd=should_redact': Failed to connect to localhost port 9998 after 0 ms: Couldn't connect to server, error code 7

tests/Common/WebFrameworkTestCase.php:285
tests/Common/WebFrameworkTestCase.php:214
tests/Integrations/Slim/V3_12/CommonScenariosTest.php:38
tests/Common/TracerTestTrait.php:343
tests/Integrations/Slim/V3_12/CommonScenariosTest.php:37
tests/Common/RetryTraitVersionGeneric.php:28

🧪 1024 Tests failed

testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integration\PHPInstallerTest::testSearchPhpBinaries
Test code or tested code printed unexpected output: Searching for available php binaries, this operation might take a while.
testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Fix with Cursor)
testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 6970f12b0000000002aad1619aeaab2c
tid: 6970f12b00000000
hexProcessTraceId: 02aad1619aeaab2c
hexProcessSpanId: 216bb450c7306686
processTraceId: 192196151257770796
processSpanId: 2408216684791883398

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
View all

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 1e42042 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

This reverts commit a3e3e4e.

This reverts commit e16e1b1.

morrisonlevi · 2026-01-17T02:14:44Z

So it's very easy to see in the log output that the optimization works, you'll see messages like this (slightly more info than we'll probably ship, but this is what it shows right now, job):

[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 12 time interrupts to profiler.

However, getting the prof-correctness tests right has been quite tricky. Locally, I can see that alloc-size is clearly working on PHP 8.4:

Flat	Flat%	Sum%	Cum	Cum%	Name
591.28MB	66.67%	66.67%	591.28MB	66.67%	standard\|str_replace
295.64MB	33.33%	100.00%	295.64MB	33.33%	standard\|str_repeat
0	0.00%	100.00%	886.92MB	100.00%	main
0	0.00%	100.00%	886.92MB	100.00%	<?php

And the graph is right, so the path <?php;main;standard\|str_replace$ should be good:

But when I run prof-correctness in CI, we get this failure:

    pprof_analysis.go:647: Analyzing results in /github/workspace/profiling/tests/correctness/allocation_time_combined/test.pprof.1.zst for profile type alloc-size
    pprof_analysis.go:650: Found a profile duration of 10.0 seconds (in test.pprof.1.zst)
    pprof_analysis.go:296: Results stored in  /github/workspace/profiling/tests/correctness/allocation_time_combined/test.pprof.1.json
    pprof_analysis.go:442: Assertion failed: stack '<?php;main;standard\|str_replace$' (labels=[]) should have been 66% +/- 1% of the profile but was 0% with 66% error
    pprof_analysis.go:445: Assertion succeeded: stack '<?php;main;standard\|str_repeat$' (labels=[]) is 33% +/- 1% of the profile (was 33% with 0% error)

What could be making this fail? Why is prof correctness not seeing these frames in its test logic?

realFlowControl · 2026-01-20T13:32:01Z

I ran the prof-correctness tests locally and I saw the alloc-size and alloc-samples match the expected JSON. So it looks like there is a difference between CI PHP and our local PHP, so I added a commit that brings an artifact upload to make sure we get ahold of the pprof file of that test run. If you download that artifact and look at the samples you see something interesting:

The pprof --raw output confirms those frames:

...
          0          0          0         31  310000992          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0: 6 6 7 3 
                thread name:[cli]
                thread id:[140646745758080 id]
...
Locations
     1: 0x0 M=1 [idle] :0:0 s=0()
     2: 0x0 M=1 [require] :0:0 s=0()
     3: 0x0 M=1 <?php /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:25:0 s=0()
     4: 0x0 M=1 standard|str_repeat :0:0 s=0()
     5: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:18:0 s=0()
     6: 0x0 M=1 standard|str_replace :0:0 s=0()
     7: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:19:0 s=0()
     8: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:20:0 s=0()
     9: 0x0 M=1 standard|microtime :0:0 s=0()
    10: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:16:0 s=0()

See the 6 6 7 3 at the end of the line?

This behaviour is also visible from the logs that the test emits:

[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_repeat" of 3 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 0 time interrupts to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 1 time interrupts to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 11 time interrupts to profiler.

The first line (str_repeat()) has 3 frames, while the other lines (str_replace()) claim to have 4 frames...

(also noting the it feels weird that there are 11 time interrupts in the last message, that'd mean that this str_replace() took > 110ms).

I have no idea what is going on, so I created #3584 to validate if this is a problem in master already and it seems so ... checking https://github.com/DataDog/dd-trace-php/actions/runs/21172074850/job/60891070189?pr=3584#step:9:1200 and you see:

[TRACE datadog_php_profiling::profiling] Sent stack sample of 4 frames, 2 labels, 12288032 bytes allocated, and 1 allocations to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample of 5 frames, 2 labels, 12288032 bytes allocated, and 1 allocations to profiler.

Where according to the source code the first one is the str_repeat(), while the second one is the str_replace().

realFlowControl · 2026-01-20T14:12:43Z

I have to add that I am not able to replicate this behaviour locally, not even in a Linux Docker container ...

realFlowControl · 2026-01-20T15:16:13Z

Got it fixed in #3584, it is this line that fixes the tests:

dd-trace-php/.github/workflows/prof_correctness.yml

Line 30 in a418d97

coverage: none

This makes sure that PHP will not have Xdebug installed, as Xdebug sets it's own opcode handlers for frameless functions, that seem to interfer: https://github.com/xdebug/xdebug/blob/e1222340fb27d446cee09e525f047b1540644d29/src/coverage/code_coverage.c#L1294-L1299

This reverts commit 44e2734.

pr-commenter · 2026-01-20T16:45:32Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-01-20 16:43:16

Comparing candidate commit ea6bc22 in PR branch levi/perf-time-alloc-piggyback with baseline commit 617ba06 in branch master.

Found 1 performance improvements and 2 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics.

scenario:ComposerTelemetryBench/benchTelemetryParsing

🟩 execution_time [-1306.061ns; -293.939ns] or [-10.705%; -2.409%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization

🟥 execution_time [+6.314µs; +6.726µs] or [+5.987%; +6.377%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

🟥 execution_time [+9.001µs; +9.939µs] or [+8.694%; +9.600%]

profiling/tests/correctness/allocation_time_combined.json

realFlowControl

I left a comment and created #3592 to unblock this PR, looks good now.

profiling/tests/correctness/allocation_time_combined.json

profiling/tests/correctness/allocation_time_combined.php

Co-authored-by: Florian Engelhardt <[email protected]>

morrisonlevi added 6 commits January 5, 2026 11:29

refactor orderings of collect_* and extract helpers

5d9030b

refactor(test): allocation + CPU combined tests

989876d

refactor: make jq for combined samples less brittle

c0119e9

refactor: revert Ordering optimizations

f43a597

Let's do this in a dedicated PR.

style: use SAFETY instead of Safety

243d028

morrisonlevi requested a review from a team as a code owner January 7, 2026 19:00

github-actions bot added profiling Relates to the Continuous Profiler tracing labels Jan 7, 2026

morrisonlevi commented Jan 7, 2026

View reviewed changes

profiling/src/allocation/mod.rs Show resolved Hide resolved

morrisonlevi commented Jan 7, 2026

View reviewed changes

profiling/src/profiling/uploader.rs Show resolved Hide resolved

realFlowControl reviewed Jan 7, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.php Outdated Show resolved Hide resolved

morrisonlevi added 2 commits January 7, 2026 15:10

test: hammer piggybacking

a9c822e

refactor: simplify test and hook it up properly

5ee032d

morrisonlevi requested a review from a team as a code owner January 7, 2026 22:27

realFlowControl requested changes Jan 8, 2026

View reviewed changes

test: fix prof-correctness

5d79ecd

realFlowControl reviewed Jan 12, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Outdated Show resolved Hide resolved

realFlowControl reviewed Jan 12, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Outdated Show resolved Hide resolved

realFlowControl reviewed Jan 12, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Show resolved Hide resolved

realFlowControl reviewed Jan 12, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Outdated Show resolved Hide resolved

morrisonlevi added 3 commits January 12, 2026 11:10

ci(profiling): run allocation_time_combined on PHP 8.4+

aa62d66

test(profiling): fix allocation_time_combined float->int

b1765c0

Merge branch 'master' into levi/perf-time-alloc-piggyback

8b15661

morrisonlevi added 2 commits January 12, 2026 15:52

ci(profiling): update expected data

23c9bc1

Merge branch 'master' into levi/perf-time-alloc-piggyback

7e93679

morrisonlevi added 5 commits January 16, 2026 17:03

Revert "DEBUG: more info in panic"

73c86a0

This reverts commit a3e3e4e.

Revert "DEBUG: do we hit frameless icalls?"

520c158

This reverts commit e16e1b1.

undo experimental debugging

35ac0b6

test: fix expected log message

87a52f4

test: use more replaces to exaggerate effect

093ce66

morrisonlevi and others added 2 commits January 16, 2026 19:30

EXPERIMENT: show leaf frame

d2d11b1

Merge branch 'master' into levi/perf-time-alloc-piggyback

78d9cbc

realFlowControl force-pushed the levi/perf-time-alloc-piggyback branch from 2356ff2 to 9b0f22f Compare January 20, 2026 10:53

realFlowControl mentioned this pull request Jan 20, 2026

ci(prof): Add frameless function call to validate stack walking #3584

Merged

2 tasks

realFlowControl added 4 commits January 20, 2026 16:34

do not fail fast, finish prof correctness

b64dd9a

upload artifacts

44e2734

Disable Xdebug for prof-correctness

d063129

Revert "upload artifacts"

ea6bc22

This reverts commit 44e2734.

realFlowControl force-pushed the levi/perf-time-alloc-piggyback branch from 9b0f22f to ea6bc22 Compare January 20, 2026 15:35

morrisonlevi added 2 commits January 20, 2026 10:03

Merge branch 'master' into levi/perf-time-alloc-piggyback

ddb1dfc

test: allow any non-zero wall-time for piggy-backed samples

5114acd

morrisonlevi commented Jan 20, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Show resolved Hide resolved

morrisonlevi changed the title ~~perf(prof): gather time samples during allocation samples~~ feat(prof): improve time sample accuracy by gathering during allocation samples Jan 21, 2026

remove first_frame for performance

9cb6ba1

realFlowControl mentioned this pull request Jan 21, 2026

ci(prof): run additional allocation test on all PHP versions #3592

Open

2 tasks

realFlowControl approved these changes Jan 21, 2026

View reviewed changes

profiling/tests/correctness/allocation_time_combined.json Show resolved Hide resolved

profiling/tests/correctness/allocation_time_combined.php Outdated Show resolved Hide resolved

Remove misleading strlen and comment

1e42042

Co-authored-by: Florian Engelhardt <[email protected]>

morrisonlevi merged commit a700b4d into master Jan 21, 2026
2002 of 2009 checks passed

morrisonlevi deleted the levi/perf-time-alloc-piggyback branch January 21, 2026 18:01

github-actions bot added this to the 1.17.0 milestone Jan 21, 2026

		$s = str_repeat("x", 10_000_000);
		strlen($s); // Use the string to prevent optimization

feat(prof): improve time sample accuracy by gathering during allocation samples #3559

feat(prof): improve time sample accuracy by gathering during allocation samples #3559

Conversation

morrisonlevi commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer checklist

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [ profiler ]

Uh oh!

Uh oh!

realFlowControl left a comment

Choose a reason for hiding this comment

Uh oh!

realFlowControl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

realFlowControl commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Jan 12, 2026 • edited by datadog-official bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

morrisonlevi commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

realFlowControl commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

realFlowControl commented Jan 20, 2026

Uh oh!

realFlowControl commented Jan 20, 2026

Uh oh!

pr-commenter bot commented Jan 20, 2026

Benchmarks [ tracer ]

scenario:ComposerTelemetryBench/benchTelemetryParsing

scenario:MessagePackSerializationBench/benchMessagePackSerialization

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

Uh oh!

Uh oh!

realFlowControl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

morrisonlevi commented Jan 7, 2026 •

edited

Loading

codecov-commenter commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

datadog-datadog-prod-us1 bot commented Jan 12, 2026 •

edited by datadog-official bot

Loading

morrisonlevi commented Jan 17, 2026 •

edited

Loading

realFlowControl commented Jan 20, 2026 •

edited

Loading