Skip to content

Conversation

@morrisonlevi
Copy link
Collaborator

@morrisonlevi morrisonlevi commented Jan 7, 2026

Description

Gathers time samples during allocation samples if interrupt_count > 0. This has two benefits:

  1. A slight accuracy improvement.
  2. A slight performance improvement in latency and CPU.

The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.

The performance benefit mainly* comes from combining the two stack walks into one. It is dampened a bit by incurring an extra atomic load, we might revisit the Ordering used in the future. When I say a slight performance benefit, here's a comparison to v1.16.0:

library_version latency_p99 cpu% memory
local 114.38 ms 127.8% 121.95 MB
1.16.0 114.35 ms 128.5% 120.19 MB

* I was going to look at the profiles and post screenshots but for whatever reason I cannot see any frames related to memory allocations in the full host profiler. I may edit this later to show flamegraphs when I can get them.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

When allocation profiler takes a sample and interrupt_count > 0,
collect both allocation and time samples in a single stack walk.

Benefits:
- Eliminates redundant stack walks when samples coincide.
- Improves time sample accuracy (closer to timer expiry).
Let's do this in a dedicated PR.
@morrisonlevi morrisonlevi requested a review from a team as a code owner January 7, 2026 19:00
@github-actions github-actions bot added profiling Relates to the Continuous Profiler tracing labels Jan 7, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.91%. Comparing base (bb3cb7c) to head (1e42042).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3559      +/-   ##
==========================================
- Coverage   62.01%   61.91%   -0.11%     
==========================================
  Files         140      140              
  Lines       13311    13311              
  Branches     1762     1762              
==========================================
- Hits         8255     8241      -14     
- Misses       4267     4280      +13     
- Partials      789      790       +1     

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb3cb7c...1e42042. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jan 7, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-01-21 15:34:11

Comparing candidate commit 1e42042 in PR branch levi/perf-time-alloc-piggyback with baseline commit bb3cb7c in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 9 unstable metrics.

@morrisonlevi morrisonlevi requested a review from a team as a code owner January 7, 2026 22:27
Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?

Comment on lines 16 to 17
$s = str_repeat("x", 10_000_000);
strlen($s); // Use the string to prevent optimization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.

@realFlowControl
Copy link
Member

I've created an upstream PR to fix the empty/malformed expected profile JSON [...]

This got merged and I re-run the correctness tests, which are now failing.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jan 12, 2026

⚠️ Tests

Fix all issues with Cursor

⚠️ Warnings

❄️ 1 New flaky test detected

testScenario with data set "A simple GET request returning a string" from slim-312-test.DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest.DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest::testScenario (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Slim\V3_12\CommonScenariosTest::testScenario with data set "A simple GET request returning a string" (DDTrace\Tests\Frameworks\Util\Request\GetSpec Object (...), array(DDTrace\Tests\Common\SpanAssertion Object (...)))
Failed web request to 'http://localhost/simple?key=value&pwd=should_redact': Failed to connect to localhost port 9998 after 0 ms: Couldn't connect to server, error code 7

tests/Common/WebFrameworkTestCase.php:285
tests/Common/WebFrameworkTestCase.php:214
tests/Integrations/Slim/V3_12/CommonScenariosTest.php:38
tests/Common/TracerTestTrait.php:343
tests/Integrations/Slim/V3_12/CommonScenariosTest.php:37
tests/Common/RetryTraitVersionGeneric.php:28

🧪 1024 Tests failed

testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integration\PHPInstallerTest::testSearchPhpBinaries
Test code or tested code printed unexpected output: Searching for available php binaries, this operation might take a while.

    testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Fix with Cursor)

testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 6970f12b0000000002aad1619aeaab2c
tid: 6970f12b00000000
hexProcessTraceId: 02aad1619aeaab2c
hexProcessSpanId: 216bb450c7306686
processTraceId: 192196151257770796
processSpanId: 2408216684791883398

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
View all
This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1e42042 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@morrisonlevi
Copy link
Collaborator Author

morrisonlevi commented Jan 17, 2026

So it's very easy to see in the log output that the optimization works, you'll see messages like this (slightly more info than we'll probably ship, but this is what it shows right now, job):

[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 12 time interrupts to profiler.

However, getting the prof-correctness tests right has been quite tricky. Locally, I can see that alloc-size is clearly working on PHP 8.4:

Flat Flat% Sum% Cum Cum% Name
591.28MB 66.67% 66.67% 591.28MB 66.67% standard|str_replace
295.64MB 33.33% 100.00% 295.64MB 33.33% standard|str_repeat
0 0.00% 100.00% 886.92MB 100.00% main
0 0.00% 100.00% 886.92MB 100.00% <?php

And the graph is right, so the path <?php;main;standard\|str_replace$ should be good:

Screenshot 2026-01-16 at 6 56 23 PM

But when I run prof-correctness in CI, we get this failure:

    pprof_analysis.go:647: Analyzing results in /github/workspace/profiling/tests/correctness/allocation_time_combined/test.pprof.1.zst for profile type alloc-size
    pprof_analysis.go:650: Found a profile duration of 10.0 seconds (in test.pprof.1.zst)
    pprof_analysis.go:296: Results stored in  /github/workspace/profiling/tests/correctness/allocation_time_combined/test.pprof.1.json
    pprof_analysis.go:442: Assertion failed: stack '<?php;main;standard\|str_replace$' (labels=[]) should have been 66% +/- 1% of the profile but was 0% with 66% error
    pprof_analysis.go:445: Assertion succeeded: stack '<?php;main;standard\|str_repeat$' (labels=[]) is 33% +/- 1% of the profile (was 33% with 0% error)

What could be making this fail? Why is prof correctness not seeing these frames in its test logic?

@realFlowControl
Copy link
Member

realFlowControl commented Jan 20, 2026

I ran the prof-correctness tests locally and I saw the alloc-size and alloc-samples match the expected JSON. So it looks like there is a difference between CI PHP and our local PHP, so I added a commit that brings an artifact upload to make sure we get ahold of the pprof file of that test run. If you download that artifact and look at the samples you see something interesting:
image
The pprof --raw output confirms those frames:

...
          0          0          0         31  310000992          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0: 6 6 7 3 
                thread name:[cli]
                thread id:[140646745758080 id]
...
Locations
     1: 0x0 M=1 [idle] :0:0 s=0()
     2: 0x0 M=1 [require] :0:0 s=0()
     3: 0x0 M=1 <?php /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:25:0 s=0()
     4: 0x0 M=1 standard|str_repeat :0:0 s=0()
     5: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:18:0 s=0()
     6: 0x0 M=1 standard|str_replace :0:0 s=0()
     7: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:19:0 s=0()
     8: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:20:0 s=0()
     9: 0x0 M=1 standard|microtime :0:0 s=0()
    10: 0x0 M=1 main /home/runner/work/dd-trace-php/dd-trace-php/profiling/tests/correctness/allocation_time_combined.php:16:0 s=0()

See the 6 6 7 3 at the end of the line?

This behaviour is also visible from the logs that the test emits:

[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_repeat" of 3 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 0 time interrupts to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 1 time interrupts to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample with leaf frame "standard|str_replace" of 4 frames, 2 labels, 10000032 bytes allocated, 1 allocations, and 11 time interrupts to profiler.

The first line (str_repeat()) has 3 frames, while the other lines (str_replace()) claim to have 4 frames...

(also noting the it feels weird that there are 11 time interrupts in the last message, that'd mean that this str_replace() took > 110ms).

I have no idea what is going on, so I created #3584 to validate if this is a problem in master already and it seems so ... checking https://github.com/DataDog/dd-trace-php/actions/runs/21172074850/job/60891070189?pr=3584#step:9:1200 and you see:

[TRACE datadog_php_profiling::profiling] Sent stack sample of 4 frames, 2 labels, 12288032 bytes allocated, and 1 allocations to profiler.
[TRACE datadog_php_profiling::profiling] Sent stack sample of 5 frames, 2 labels, 12288032 bytes allocated, and 1 allocations to profiler.

Where according to the source code the first one is the str_repeat(), while the second one is the str_replace().

@realFlowControl
Copy link
Member

I have to add that I am not able to replicate this behaviour locally, not even in a Linux Docker container ...

@realFlowControl
Copy link
Member

Got it fixed in #3584, it is this line that fixes the tests:

This makes sure that PHP will not have Xdebug installed, as Xdebug sets it's own opcode handlers for frameless functions, that seem to interfer: https://github.com/xdebug/xdebug/blob/e1222340fb27d446cee09e525f047b1540644d29/src/coverage/code_coverage.c#L1294-L1299

@realFlowControl realFlowControl force-pushed the levi/perf-time-alloc-piggyback branch from 9b0f22f to ea6bc22 Compare January 20, 2026 15:35
@pr-commenter
Copy link

pr-commenter bot commented Jan 20, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-01-20 16:43:16

Comparing candidate commit ea6bc22 in PR branch levi/perf-time-alloc-piggyback with baseline commit 617ba06 in branch master.

Found 1 performance improvements and 2 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics.

scenario:ComposerTelemetryBench/benchTelemetryParsing

  • 🟩 execution_time [-1306.061ns; -293.939ns] or [-10.705%; -2.409%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization

  • 🟥 execution_time [+6.314µs; +6.726µs] or [+5.987%; +6.377%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟥 execution_time [+9.001µs; +9.939µs] or [+8.694%; +9.600%]

@morrisonlevi morrisonlevi changed the title perf(prof): gather time samples during allocation samples feat(prof): improve time sample accuracy by gathering during allocation samples Jan 21, 2026
Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment and created #3592 to unblock this PR, looks good now.

Co-authored-by: Florian Engelhardt <[email protected]>
@morrisonlevi morrisonlevi merged commit a700b4d into master Jan 21, 2026
2002 of 2009 checks passed
@morrisonlevi morrisonlevi deleted the levi/perf-time-alloc-piggyback branch January 21, 2026 18:01
@github-actions github-actions bot added this to the 1.17.0 milestone Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants