refactor(profiling): use module globals for ZMM state#3608
refactor(profiling): use module globals for ZMM state#3608morrisonlevi merged 5 commits intomasterfrom
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3608 +/- ##
==========================================
+ Coverage 62.12% 62.14% +0.02%
==========================================
Files 141 141
Lines 13387 13387
Branches 1753 1753
==========================================
+ Hits 8317 8320 +3
Misses 4270 4270
+ Partials 800 797 -3 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ profiler ]Benchmark execution time: 2026-01-30 16:42:46 Comparing candidate commit 4c14aa4 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 8 unstable metrics. |
|
I've had AI help me set up a container that uses PHP 8.4 ZTS with FrankenPHP as well as NTS with php-fpm. I used PHP 8.4 because that's what the current release of FrankenPHP embeds. Things look good so far. |
realFlowControl
left a comment
There was a problem hiding this comment.
We should (in a follow up PR) move all the other "global state" we have in other modules to PHP globals, but so far this is looking good.
Can you do a short run with DoE? I'd not expect this to show any performance problems, but just to be sure and I've added two comments.
| // TODO: Florian, do we need this? | ||
| // let globals = globals_ptr.cast::<ProfilerGlobals>(); | ||
| // (*globals).zend_mm_state = ZendMMState::new(); | ||
|
|
There was a problem hiding this comment.
| // TODO: Florian, do we need this? | |
| // let globals = globals_ptr.cast::<ProfilerGlobals>(); | |
| // (*globals).zend_mm_state = ZendMMState::new(); |
I think we do not, we did not do so earlier also. Also we unregister our hooks in RSHUTDOWN, nothing in allocation profiling touches the global state after RSHUTDOWN and PHP itself takes care of freeing the memory, I guess it is fine the way it is.
Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
|
One of our internal performance tests is showing extreme improvements, which seems suspect, like maybe it's failing to collect the amount of data it should. I will investigate this more before committing. |
|
Subsequent tests have been "normal." Going to proceed. |
* refactor(profiling): use module globals for ZMM state * style: fix clippy warnings * Apply suggestions from code review Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * docs: note ZTS vs NTS differences --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
* Adds process tags to profiler uploader * remove useless utils function * remove empty lines and fix spelling * add function to ddtrace.sym * feat(CI: installer tests): fix installer tests by changing enabling check on appsec extension (#3604) Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * refactor(profiling): use module globals for ZMM state (#3608) * refactor(profiling): use module globals for ZMM state * style: fix clippy warnings * Apply suggestions from code review Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * docs: note ZTS vs NTS differences --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * refactor(profiling): extract Backtrace type (#3612) * refactor(profiling): extract Backtrace type In a future change, this may hold a refcount for another object, so we need to encapsulate it. * fix `test_collect_stack_sample` not running --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * Propagate RELIABILITY_ENV_BRANCH to downstream pipeline (#3605) * Add simple_onboarding_appsec to SSI system tests (#3617) * Stores remote config requests in request-replayer (#3585) * feat(profiling): internal metrics for overhead (#3616) * feat(profiling): internal metrics for overhead * feat(profiling): move CPU time capture to include serialization for `ddprof_upload` for current profile exported Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(profiling): add CPU time tracking for `ddprof_time` thread Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(profiling): separate CPU time tracking per background thread Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix(tracing): hook is_internal was backwards (#3625) * Fix phpt asm standalone tests (#3628) * fix readme file links (#3610) * test(language-tests): properly skip online tests and disabled soap_qname_crash.phpt on all version (#3632) Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * Collect framework endpoints (#3548) * Only run publishing jobs when all dependent pipelines succeed (#3635) Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com> * chore(profiling): update libdatadog to 26 (#3633) * test(CI): manually handle git operation for windows jobs (#3634) * test(CI): add aggressive git cleanup on windows runner Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * test(CI): add manual cleanup in before_script step Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(CI): add healthcheck to SQLSRV server setup (#3619) * feat(CI): add healthcheck to SQLSRV server setup Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: add troubleshooting script for SQLSRV Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat: add explicit memory limit and paths Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: replace sqlsrv docker image Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * fix(CI: test_metrics): add explicit flush in logging (#3637) * fix(logging): fsync crash logs before _Exit() to prevent data loss When a SIGSEGV occurs, the signal handler logs "Segmentation fault encountered" and then calls _Exit() which terminates the process immediately. Without fsync(), kernel write buffers may not be flushed to disk before termination, causing a race condition where the error log file is sometimes not created. This fix adds fsync() on Unix/Linux and _commit() on Windows after write() in ddtrace_log_with_time() to ensure crash logs persist to disk before process termination. The issue affects production (rare but possible during power loss, kernel panic, or I/O errors) and causes consistent test failures where tests check for log files immediately after crashes (before kernel writeback completes). Fixes flaky test_metrics SigSegVTest::testGet failures on Kubernetes where dd_php_error.log was not being created consistently. * fix(signals): move flush in sigsegv handler Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * Adds process_tags to live debugger payloads (#3580) * init process tags for APM Co-Authored-By: PROFeNoM <alexandre.choura@datadoghq.com> * feat(process_tags): add process_tags to tracing payloads * small auto review and fix test * bwoebi review * fix test * Adds process_tags to live debugger payloads * temporary libdatadog bump * auto review * bump libdatadog * fix build * update makefile && make cbindgen * fixing test * fixing test * fix appsec tests --------- Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com> * chore(profiling): update libdatadog 26 to 27 (#3640) * chore(profiling): update libdatadog 26 to 27 * process tags were removed while rebasing to sign commit --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com> Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> Co-authored-by: Alexandre Rulleau <55387832+Leiyks@users.noreply.github.com> Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com> Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com> Co-authored-by: Alejandro Estringana Ruiz <alejandro.estringanaruiz@datadoghq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com> Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>
* Adds process tags to profiler uploader * remove useless utils function * remove empty lines and fix spelling * add function to ddtrace.sym * feat(CI: installer tests): fix installer tests by changing enabling check on appsec extension (#3604) Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * refactor(profiling): use module globals for ZMM state (#3608) * refactor(profiling): use module globals for ZMM state * style: fix clippy warnings * Apply suggestions from code review Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * docs: note ZTS vs NTS differences --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * refactor(profiling): extract Backtrace type (#3612) * refactor(profiling): extract Backtrace type In a future change, this may hold a refcount for another object, so we need to encapsulate it. * fix `test_collect_stack_sample` not running --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> * Propagate RELIABILITY_ENV_BRANCH to downstream pipeline (#3605) * Add simple_onboarding_appsec to SSI system tests (#3617) * Stores remote config requests in request-replayer (#3585) * feat(profiling): internal metrics for overhead (#3616) * feat(profiling): internal metrics for overhead * feat(profiling): move CPU time capture to include serialization for `ddprof_upload` for current profile exported Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(profiling): add CPU time tracking for `ddprof_time` thread Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(profiling): separate CPU time tracking per background thread Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * fix(tracing): hook is_internal was backwards (#3625) * Fix phpt asm standalone tests (#3628) * fix readme file links (#3610) * test(language-tests): properly skip online tests and disabled soap_qname_crash.phpt on all version (#3632) Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * Collect framework endpoints (#3548) * Only run publishing jobs when all dependent pipelines succeed (#3635) Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com> * chore(profiling): update libdatadog to 26 (#3633) * test(CI): manually handle git operation for windows jobs (#3634) * test(CI): add aggressive git cleanup on windows runner Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * test(CI): add manual cleanup in before_script step Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(CI): add healthcheck to SQLSRV server setup (#3619) * feat(CI): add healthcheck to SQLSRV server setup Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: add troubleshooting script for SQLSRV Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat: add explicit memory limit and paths Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: replace sqlsrv docker image Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * fix(CI: test_metrics): add explicit flush in logging (#3637) * fix(logging): fsync crash logs before _Exit() to prevent data loss When a SIGSEGV occurs, the signal handler logs "Segmentation fault encountered" and then calls _Exit() which terminates the process immediately. Without fsync(), kernel write buffers may not be flushed to disk before termination, causing a race condition where the error log file is sometimes not created. This fix adds fsync() on Unix/Linux and _commit() on Windows after write() in ddtrace_log_with_time() to ensure crash logs persist to disk before process termination. The issue affects production (rare but possible during power loss, kernel panic, or I/O errors) and causes consistent test failures where tests check for log files immediately after crashes (before kernel writeback completes). Fixes flaky test_metrics SigSegVTest::testGet failures on Kubernetes where dd_php_error.log was not being created consistently. * fix(signals): move flush in sigsegv handler Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * Adds process_tags to live debugger payloads (#3580) * init process tags for APM Co-Authored-By: PROFeNoM <alexandre.choura@datadoghq.com> * feat(process_tags): add process_tags to tracing payloads * small auto review and fix test * bwoebi review * fix test * Adds process_tags to live debugger payloads * temporary libdatadog bump * auto review * bump libdatadog * fix build * update makefile && make cbindgen * fixing test * fixing test * fix appsec tests --------- Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com> * chore(profiling): update libdatadog 26 to 27 (#3640) * chore(profiling): update libdatadog 26 to 27 * process tags were removed while rebasing to sign commit --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com> Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com> Co-authored-by: Alexandre Rulleau <55387832+Leiyks@users.noreply.github.com> Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com> Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com> Co-authored-by: Alejandro Estringana Ruiz <alejandro.estringanaruiz@datadoghq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com> Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>
Description
ProfilerGlobalsstruct which for now just containszend_mm_state: Cell<ZendMMState>.module_globals.rsto centralize PHP globals management.GLOBALS(NTS) into module_globals.rs.GLOBALS_ID_PTR(ZTS) into module_globals.rs and renamed itGLOBALS_ID(because it's not a pointer, it's the thing the pointer points at).ginit()andgshutdown()lifecycle functions into module_globals.rs, and gave them export names ofddog_php_prof_{ginit,gshutdown).zend::ModuleEntryto useProfilerGlobalsand pointers to module globals. Previously, this was faked just so we could observe the ginit/gshutdown phases but now we have real module globals.mod.rsbecause they are the same for ge84 and le83.Motivation
In #3542 we worked around a compatibility issue with ext/grpc by mimicking the PHP module globals pattern. Why mimic it when we can use it?
For accessing module globals, we do have to translate the C macro
ZEND_MODULE_GLOBALS_BULKto Rust, though. Here are its expansions:So for us,
module_name##_globals_idis justGLOBALS_IDandmodule_name##_globalsis justGLOBALS. The rest matches C directly.Reviewer checklist