Skip to content

refactor(profiling): use module globals for ZMM state#3608

Merged
morrisonlevi merged 5 commits intomasterfrom
levi/module-globals
Jan 31, 2026
Merged

refactor(profiling): use module globals for ZMM state#3608
morrisonlevi merged 5 commits intomasterfrom
levi/module-globals

Conversation

@morrisonlevi
Copy link
Collaborator

@morrisonlevi morrisonlevi commented Jan 29, 2026

Description

  • Introduced ProfilerGlobals struct which for now just contains zend_mm_state: Cell<ZendMMState>.
  • Created module_globals.rs to centralize PHP globals management.
    • Moved GLOBALS (NTS) into module_globals.rs.
    • Moved GLOBALS_ID_PTR (ZTS) into module_globals.rs and renamed it GLOBALS_ID (because it's not a pointer, it's the thing the pointer points at).
    • Moved ginit() and gshutdown() lifecycle functions into module_globals.rs, and gave them export names of ddog_php_prof_{ginit,gshutdown).
  • Updated zend::ModuleEntry to use ProfilerGlobals and pointers to module globals. Previously, this was faked just so we could observe the ginit/gshutdown phases but now we have real module globals.
  • Moved some tls macros in the allocation profiler into mod.rs because they are the same for ge84 and le83.

Motivation

In #3542 we worked around a compatibility issue with ext/grpc by mimicking the PHP module globals pattern. Why mimic it when we can use it?

For accessing module globals, we do have to translate the C macro ZEND_MODULE_GLOBALS_BULK to Rust, though. Here are its expansions:

(ZTS) ZEND_MODULE_GLOBALS_BULK(module_name)
→ TSRMG_BULK(module_name##_globals_id, zend_##module_name##_globals *)
→ ((zend_##module_name##_globals *) (*((void ***) tsrm_get_ls_cache()))[TSRM_UNSHUFFLE_RSRC_ID(module_name##_globals_id)])
→ ((zend_##module_name##_globals *) (*((void ***) tsrm_get_ls_cache()))[module_name##_globals_id - 1])

(NTS) ZEND_MODULE_GLOBALS_BULK(module_name)
→ (&module_name##_globals)

So for us, module_name##_globals_id is just GLOBALS_ID and module_name##_globals is just GLOBALS. The rest matches C directly.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@morrisonlevi morrisonlevi requested a review from a team as a code owner January 29, 2026 18:38
@morrisonlevi morrisonlevi added the profiling Relates to the Continuous Profiler label Jan 29, 2026
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jan 29, 2026

⚠️ Tests

Fix all issues with Cursor

⚠️ Warnings

🧪 1026 Tests failed

    testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Fix with Cursor)

testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 697cde8f000000005596de911bb29288
tid: 697cde8f00000000
hexProcessTraceId: 5596de911bb29288
hexProcessSpanId: c975a2947e84e912
processTraceId: 6167361454546784904
processSpanId: 14516687732560161042

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 697cde8c00000000d066843fc28ee4d6
tid: 697cde8c00000000
hexProcessTraceId: d066843fc28ee4d6
hexProcessSpanId: 737d0ef296ebbefb
processTraceId: 15016835416895448278
processSpanId: 8321824121527451387
View all

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 4c14aa4 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.14%. Comparing base (e380043) to head (4c14aa4).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3608      +/-   ##
==========================================
+ Coverage   62.12%   62.14%   +0.02%     
==========================================
  Files         141      141              
  Lines       13387    13387              
  Branches     1753     1753              
==========================================
+ Hits         8317     8320       +3     
  Misses       4270     4270              
+ Partials      800      797       -3     

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e380043...4c14aa4. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jan 29, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-01-30 16:42:46

Comparing candidate commit 4c14aa4 in PR branch levi/module-globals with baseline commit e380043 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 8 unstable metrics.

@morrisonlevi
Copy link
Collaborator Author

I've had AI help me set up a container that uses PHP 8.4 ZTS with FrankenPHP as well as NTS with php-fpm. I used PHP 8.4 because that's what the current release of FrankenPHP embeds. Things look good so far.

Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should (in a follow up PR) move all the other "global state" we have in other modules to PHP globals, but so far this is looking good.
Can you do a short run with DoE? I'd not expect this to show any performance problems, but just to be sure and I've added two comments.

Comment on lines +105 to +108
// TODO: Florian, do we need this?
// let globals = globals_ptr.cast::<ProfilerGlobals>();
// (*globals).zend_mm_state = ZendMMState::new();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Florian, do we need this?
// let globals = globals_ptr.cast::<ProfilerGlobals>();
// (*globals).zend_mm_state = ZendMMState::new();

I think we do not, we did not do so earlier also. Also we unregister our hooks in RSHUTDOWN, nothing in allocation profiling touches the global state after RSHUTDOWN and PHP itself takes care of freeing the memory, I guess it is fine the way it is.

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
@morrisonlevi
Copy link
Collaborator Author

One of our internal performance tests is showing extreme improvements, which seems suspect, like maybe it's failing to collect the amount of data it should. I will investigate this more before committing.

@morrisonlevi
Copy link
Collaborator Author

Subsequent tests have been "normal." Going to proceed.

@morrisonlevi morrisonlevi merged commit f7f9271 into master Jan 31, 2026
2027 of 2038 checks passed
@morrisonlevi morrisonlevi deleted the levi/module-globals branch January 31, 2026 17:44
@github-actions github-actions bot added this to the 1.17.0 milestone Jan 31, 2026
dubloom pushed a commit that referenced this pull request Feb 12, 2026
* refactor(profiling): use module globals for ZMM state

* style: fix clippy warnings

* Apply suggestions from code review

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* docs: note ZTS vs NTS differences

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
dubloom added a commit that referenced this pull request Feb 12, 2026
* Adds process tags to profiler uploader

* remove useless utils function

* remove empty lines and fix spelling

* add function to ddtrace.sym

* feat(CI: installer tests): fix installer tests by changing enabling check on appsec extension (#3604)

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* refactor(profiling): use module globals for ZMM state (#3608)

* refactor(profiling): use module globals for ZMM state

* style: fix clippy warnings

* Apply suggestions from code review

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* docs: note ZTS vs NTS differences

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* refactor(profiling): extract Backtrace type (#3612)

* refactor(profiling): extract Backtrace type

In a future change, this may hold a refcount for another object, so
we need to encapsulate it.

* fix `test_collect_stack_sample` not running

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* Propagate RELIABILITY_ENV_BRANCH to downstream pipeline (#3605)

* Add simple_onboarding_appsec to SSI system tests (#3617)

* Stores remote config requests in request-replayer (#3585)

* feat(profiling): internal metrics for overhead (#3616)

* feat(profiling): internal metrics for overhead

* feat(profiling): move CPU time capture to include serialization for `ddprof_upload` for current profile exported

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(profiling): add CPU time tracking for `ddprof_time` thread

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(profiling): separate CPU time tracking per background thread

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tracing): hook is_internal was backwards (#3625)

* Fix phpt asm standalone tests (#3628)

* fix readme file links (#3610)

* test(language-tests): properly skip online tests and disabled soap_qname_crash.phpt on all version (#3632)

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* Collect framework endpoints (#3548)

* Only run publishing jobs when all dependent pipelines succeed (#3635)

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>

* chore(profiling): update libdatadog to 26 (#3633)

* test(CI): manually handle git operation for windows jobs (#3634)

* test(CI): add aggressive git cleanup on windows runner

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* test(CI): add manual cleanup in before_script step

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* feat(CI): add healthcheck to SQLSRV server setup (#3619)

* feat(CI): add healthcheck to SQLSRV server setup

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* chore: add troubleshooting script for SQLSRV

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* feat: add explicit memory limit and paths

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* chore: replace sqlsrv docker image

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* fix(CI: test_metrics): add explicit flush in logging (#3637)

* fix(logging): fsync crash logs before _Exit() to prevent data loss

When a SIGSEGV occurs, the signal handler logs "Segmentation fault encountered"
and then calls _Exit() which terminates the process immediately. Without fsync(),
kernel write buffers may not be flushed to disk before termination, causing
a race condition where the error log file is sometimes not created.

This fix adds fsync() on Unix/Linux and _commit() on Windows after write() in
ddtrace_log_with_time() to ensure crash logs persist to disk before process
termination.

The issue affects production (rare but possible during power loss, kernel panic,
or I/O errors) and causes consistent test failures where tests check for log
files immediately after crashes (before kernel writeback completes).

Fixes flaky test_metrics SigSegVTest::testGet failures on Kubernetes where
dd_php_error.log was not being created consistently.

* fix(signals): move flush in sigsegv handler

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* Adds process_tags to live debugger payloads (#3580)

* init process tags for APM

Co-Authored-By: PROFeNoM <alexandre.choura@datadoghq.com>

* feat(process_tags): add process_tags to tracing payloads

* small auto review and fix test

* bwoebi review

* fix test

* Adds process_tags to live debugger payloads

* temporary libdatadog bump

* auto review

* bump libdatadog

* fix build

* update makefile && make cbindgen

* fixing test

* fixing test

* fix appsec tests

---------

Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>

* chore(profiling): update libdatadog 26 to 27 (#3640)

* chore(profiling): update libdatadog 26 to 27

* process tags were removed while rebasing to sign commit

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>
Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Co-authored-by: Alexandre Rulleau <55387832+Leiyks@users.noreply.github.com>
Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com>
Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com>
Co-authored-by: Alejandro Estringana Ruiz <alejandro.estringanaruiz@datadoghq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com>
Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>
BridgeAR pushed a commit that referenced this pull request Feb 19, 2026
* Adds process tags to profiler uploader

* remove useless utils function

* remove empty lines and fix spelling

* add function to ddtrace.sym

* feat(CI: installer tests): fix installer tests by changing enabling check on appsec extension (#3604)

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* refactor(profiling): use module globals for ZMM state (#3608)

* refactor(profiling): use module globals for ZMM state

* style: fix clippy warnings

* Apply suggestions from code review

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* docs: note ZTS vs NTS differences

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* refactor(profiling): extract Backtrace type (#3612)

* refactor(profiling): extract Backtrace type

In a future change, this may hold a refcount for another object, so
we need to encapsulate it.

* fix `test_collect_stack_sample` not running

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>

* Propagate RELIABILITY_ENV_BRANCH to downstream pipeline (#3605)

* Add simple_onboarding_appsec to SSI system tests (#3617)

* Stores remote config requests in request-replayer (#3585)

* feat(profiling): internal metrics for overhead (#3616)

* feat(profiling): internal metrics for overhead

* feat(profiling): move CPU time capture to include serialization for `ddprof_upload` for current profile exported

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(profiling): add CPU time tracking for `ddprof_time` thread

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(profiling): separate CPU time tracking per background thread

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tracing): hook is_internal was backwards (#3625)

* Fix phpt asm standalone tests (#3628)

* fix readme file links (#3610)

* test(language-tests): properly skip online tests and disabled soap_qname_crash.phpt on all version (#3632)

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* Collect framework endpoints (#3548)

* Only run publishing jobs when all dependent pipelines succeed (#3635)

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>

* chore(profiling): update libdatadog to 26 (#3633)

* test(CI): manually handle git operation for windows jobs (#3634)

* test(CI): add aggressive git cleanup on windows runner

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* test(CI): add manual cleanup in before_script step

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* feat(CI): add healthcheck to SQLSRV server setup (#3619)

* feat(CI): add healthcheck to SQLSRV server setup

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* chore: add troubleshooting script for SQLSRV

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* feat: add explicit memory limit and paths

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* chore: replace sqlsrv docker image

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* fix(CI: test_metrics): add explicit flush in logging (#3637)

* fix(logging): fsync crash logs before _Exit() to prevent data loss

When a SIGSEGV occurs, the signal handler logs "Segmentation fault encountered"
and then calls _Exit() which terminates the process immediately. Without fsync(),
kernel write buffers may not be flushed to disk before termination, causing
a race condition where the error log file is sometimes not created.

This fix adds fsync() on Unix/Linux and _commit() on Windows after write() in
ddtrace_log_with_time() to ensure crash logs persist to disk before process
termination.

The issue affects production (rare but possible during power loss, kernel panic,
or I/O errors) and causes consistent test failures where tests check for log
files immediately after crashes (before kernel writeback completes).

Fixes flaky test_metrics SigSegVTest::testGet failures on Kubernetes where
dd_php_error.log was not being created consistently.

* fix(signals): move flush in sigsegv handler

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

* Adds process_tags to live debugger payloads (#3580)

* init process tags for APM

Co-Authored-By: PROFeNoM <alexandre.choura@datadoghq.com>

* feat(process_tags): add process_tags to tracing payloads

* small auto review and fix test

* bwoebi review

* fix test

* Adds process_tags to live debugger payloads

* temporary libdatadog bump

* auto review

* bump libdatadog

* fix build

* update makefile && make cbindgen

* fixing test

* fixing test

* fix appsec tests

---------

Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>

* chore(profiling): update libdatadog 26 to 27 (#3640)

* chore(profiling): update libdatadog 26 to 27

* process tags were removed while rebasing to sign commit

---------

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>
Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
Co-authored-by: Florian Engelhardt <florian.engelhardt@datadoghq.com>
Co-authored-by: Alexandre Rulleau <55387832+Leiyks@users.noreply.github.com>
Co-authored-by: Levi Morrison <levi.morrison@datadoghq.com>
Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com>
Co-authored-by: Alejandro Estringana Ruiz <alejandro.estringanaruiz@datadoghq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Bob Weinand <bob.weinand@datadoghq.com>
Co-authored-by: PROFeNoM <alexandre.choura@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants