-
Notifications
You must be signed in to change notification settings - Fork 167
feat(prof): improve time sample accuracy by gathering during allocation samples #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When allocation profiler takes a sample and interrupt_count > 0, collect both allocation and time samples in a single stack walk. Benefits: - Eliminates redundant stack walks when samples coincide. - Improves time sample accuracy (closer to timer expiry).
Let's do this in a dedicated PR.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3559 +/- ##
==========================================
- Coverage 62.01% 61.91% -0.11%
==========================================
Files 140 140
Lines 13311 13311
Branches 1762 1762
==========================================
- Hits 8255 8241 -14
- Misses 4267 4280 +13
- Partials 789 790 +1 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ profiler ]Benchmark execution time: 2026-01-21 15:34:11 Comparing candidate commit 1e42042 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 9 unstable metrics. |
realFlowControl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?
| $s = str_repeat("x", 10_000_000); | ||
| strlen($s); // Use the string to prevent optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.
This got merged and I re-run the correctness tests, which are now failing. |
|
|
So it's very easy to see in the log output that the optimization works, you'll see messages like this (slightly more info than we'll probably ship, but this is what it shows right now, job): However, getting the prof-correctness tests right has been quite tricky. Locally, I can see that
And the graph is right, so the path
But when I run prof-correctness in CI, we get this failure: What could be making this fail? Why is prof correctness not seeing these frames in its test logic? |
2356ff2 to
9b0f22f
Compare
|
I ran the prof-correctness tests locally and I saw the See the This behaviour is also visible from the logs that the test emits: The first line ( (also noting the it feels weird that there are I have no idea what is going on, so I created #3584 to validate if this is a problem in Where according to the source code the first one is the |
|
I have to add that I am not able to replicate this behaviour locally, not even in a Linux Docker container ... |
|
Got it fixed in #3584, it is this line that fixes the tests:
This makes sure that PHP will not have Xdebug installed, as Xdebug sets it's own opcode handlers for frameless functions, that seem to interfer: https://github.com/xdebug/xdebug/blob/e1222340fb27d446cee09e525f047b1540644d29/src/coverage/code_coverage.c#L1294-L1299 |
9b0f22f to
ea6bc22
Compare
Benchmarks [ tracer ]Benchmark execution time: 2026-01-20 16:43:16 Comparing candidate commit ea6bc22 in PR branch Found 1 performance improvements and 2 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics. scenario:ComposerTelemetryBench/benchTelemetryParsing
scenario:MessagePackSerializationBench/benchMessagePackSerialization
scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache
|
realFlowControl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment and created #3592 to unblock this PR, looks good now.
Co-authored-by: Florian Engelhardt <[email protected]>


Description
Gathers time samples during allocation samples if
interrupt_count > 0. This has two benefits:The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.
The performance benefit mainly* comes from combining the two stack walks into one. It is dampened a bit by incurring an extra atomic load, we might revisit the
Orderingused in the future. When I say a slight performance benefit, here's a comparison to v1.16.0:* I was going to look at the profiles and post screenshots but for whatever reason I cannot see any frames related to memory allocations in the full host profiler. I may edit this later to show flamegraphs when I can get them.
Reviewer checklist