Add PHP Health Metrics (PHM) with /proc-based worker collection#145
Conversation
There was a problem hiding this comment.
Pull request overview
Adds PHP Health Metrics (PHM) collection and reporting from the forked reporter worker, using Linux /proc to sample the parent PHP-FPM process and exporting the data via SkyWalking’s native meter protocol.
Changes:
- Add
/proc-based PHM collector and wire it into the worker lifecycle/config. - Add gRPC
collectBatch-based meter batching path and filter meter items out of the existing trace/log collect stream. - Update e2e expectations/docs and adjust Rust CI to reliably install
php*-fpmviappa:ondrej/phpon Ubuntu 24.04.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/src/reporter/reporter_grpc.rs | Spawns meter batch reporter and wraps consumer to divert meters to batch path. |
| worker/src/reporter/mod.rs | Registers new reporter submodules. |
| worker/src/reporter/meter_filter.rs | Filters CollectItem::Meter out of main stream and forwards to meter batch channel. |
| worker/src/reporter/meter_batch.rs | Implements collect_batch flush + bounded retry/retention logic for meters. |
| worker/src/phm.rs | Implements Linux /proc sampling for PHM meters and reports them into the worker channel. |
| worker/src/lib.rs | Adds PHM config to worker config and starts collector alongside heartbeat. |
| src/worker.rs | Builds PHM worker configuration based on INI settings. |
| src/module.rs | Adds lazy INI reads for new PHM settings and ensures logger uses cloned reporter. |
| src/lib.rs | Registers new INI settings for PHM enablement and report period. |
| tests/common/mod.rs | Enables PHM in one FPM fixture instance with shorter reporting period. |
| tests/e2e.rs | Extends wait time to allow PHM meters to be reported before validation. |
| tests/data/expected_context.yaml | Adds expected meter items assertions for PHM. |
| docs/en/setup/service-agent/php-agent/README.md | Documents PHM feature, platform constraints, and meter list. |
| docs/en/configuration/ini-settings.md | Documents new INI settings for PHM. |
| README.md | Mentions PHM in the project description. |
| Cargo.toml | Bumps workspace version to 1.2.0. |
| .github/workflows/rust.yml | Adds explicit ondrej PPA setup before installing php*-fpm on Ubuntu 24.04. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| phm: if *METRICS_ENABLE { | ||
| Some(PhmConfiguration { | ||
| service_name: SERVICE_NAME.clone(), | ||
| service_instance: SERVICE_INSTANCE.clone(), | ||
| report_period_secs: *METRICS_REPORT_PERIOD, | ||
| php_process_pid: libc::getpid() as i32, | ||
| }) | ||
| } else { | ||
| None | ||
| }, |
There was a problem hiding this comment.
Thanks for catching this. Addressed in 5ed3fb9: phm_configuration() now stores the parent PHP-FPM PID via getppid() for the fallback path, and PHM is only enabled on Linux when metrics_enable is On.
| "".to_string(), | ||
| Policy::System, | ||
| ); | ||
| module.add_ini(SKYWALKING_AGENT_METRICS_ENABLE, true, Policy::System); |
There was a problem hiding this comment.
Agreed. Addressed in 5ed3fb9: metrics_enable now defaults to true on Linux and false on other platforms via #[cfg(target_os = linux)] on the INI registration.
| | skywalking_agent.instance_name | Instance name. You can set `${HOSTNAME}`, refer to [Example #1](https://www.php.net/manual/en/install.fpm.configuration.php) | | | ||
| | skywalking_agent.standalone_socket_path | Unix domain socket file path of standalone skywalking php worker. Only available when `reporter_type` is `standalone`. | | | ||
| | skywalking_agent.psr_logging_level | The log level reported to SkyWalking, based on PSR-3, one of `Off`, `Debug`, `Info`, Notice`, Warning`, Error`, Critical`, Alert`, Emergency`. | Off | | ||
| | skywalking_agent.metrics_enable | Enable PHP Health Metrics (PHM) meter reporting via native MeterReportService. **Linux only** (requires `/proc`). Enabled by default when the agent is active; set to `Off` to disable. Reports six process meters: CPU utilization, memory used/peak, virtual memory, thread count, and open FD count. See [PHP agent README](../setup/service-agent/php-agent/README.md#php-health-metrics-phm). | On | |
There was a problem hiding this comment.
Updated in 5ed3fb9: the default value column now reads On (Linux); Off (other), matching the platform-specific INI default.
|
|
||
| [workspace.package] | ||
| version = "1.1.0" | ||
| version = "1.2.0" |
There was a problem hiding this comment.
This change needs to be reflected in Cargo.lock.
| let (meter_tx, meter_rx) = mpsc::channel(128); | ||
| let meter_channel = channel.clone(); | ||
| let meter_authentication = config.authentication.clone(); | ||
| tokio::spawn(async move { | ||
| if let Err(err) = | ||
| meter_batch::run_meter_batch_reporter(meter_channel, meter_authentication, meter_rx) | ||
| .await | ||
| { | ||
| warn!(?err, "Meter batch reporter failed"); | ||
| } | ||
| }); | ||
|
|
||
| let consumer = MeterFilteringConsumer::new(consumer, meter_tx); | ||
|
|
There was a problem hiding this comment.
We shouldn't implement a separate gRPC metrics reporting logic. Instead, we should use the skywalking::metrics module.
Example: https://github.com/apache/skywalking-rust/blob/master/examples/simple_metric_report.rs.
Registering a Gauge metric in the init method of src/module.rs should meet our requirements.
There was a problem hiding this comment.
Thanks — I agree we should use skywalking::metrics rather than a custom gRPC meter batch implementation.
|
I've pushed a follow-up commit ( Metricer refactor (review feedback)Per your suggestion, PHM no longer uses a custom meter gRPC batch path. The changes:
PHM sampling modelPHM samples the current PHP process via PHM is not enabled when CI stabilityRust workflow e2e setup now retries DocsREADME + ini-settings updated to match the Metricer architecture. |
|
|
||
| let reporter = Arc::new(Reporter::new(&*SOCKET_FILE_PATH)); | ||
|
|
||
| init_phm_metrics(reporter.clone()); |
There was a problem hiding this comment.
The module::init method should not contain any thread creation operations; that was an error in my previous hint.
You can refer to report_properties_and_keep_alive, as its heartbeat reporting logic is very similar to reporting metrics.
Please move the skywalking::metrics related operations into start_worker.
Report six instance_php_* process meters on Linux via /proc sampling in the forked reporter worker. Use skywalking::metrics::Metricer (booted from start_worker alongside report_properties_and_keep_alive, not module::init); sample the parent PHP process via getppid(). PHM defaults On on Linux; disabled for standalone reporter. Remove custom meter gRPC path; stabilize e2e CI (compose retry, PHM on fpm-test-1 only).
aec9c01 to
4a75da4
Compare
| let mut metricer = Metricer::new(config.service_name, config.service_instance, reporter); | ||
| metricer.set_report_interval(report_period); | ||
| register_gauges(&mut metricer, samples); | ||
| let _booting = metricer.boot(); |
There was a problem hiding this comment.
We cannot allow the return value of metricer.boot() to be dropped.
| if let Some(mb) = read_status_kib(pid, "VmRSS") { | ||
| PhmSamples::store(&samples.memory_used_mb, mb); | ||
| } | ||
| if let Some(mb) = read_status_kib(pid, "VmHWM") { | ||
| PhmSamples::store(&samples.memory_peak_mb, mb); | ||
| } | ||
| if let Some(mb) = read_status_kib(pid, "VmSize") { | ||
| PhmSamples::store(&samples.virtual_memory_mb, mb); | ||
| } | ||
| if let Some(count) = read_status_count(pid, "Threads") { | ||
| PhmSamples::store(&samples.thread_count, count as f64); | ||
| } |
There was a problem hiding this comment.
These operations repeatedly read the /proc/{pid}/status file and need to be consolidated into a single read.
Return Booting from boot_phm_metrics and keep it alive for the worker
lifetime so metric reporting is not shut down on drop. Consolidate
VmRSS/VmHWM/VmSize/Threads sampling into a single /proc/{pid}/status read.
c587c2d to
2f88b7e
Compare
|
both points are addressed in the latest commit. 1. Keep
|
Summary
Add PHP Health Metrics (PHM): the reporter worker samples the parent PHP-FPM process via Linux
/procand reports sixinstance_php_*meters through nativeMeterReportService/collectBatch.Meters (aligned with OAP
php-runtime.yamland Horizon UI widgets):instance_php_process_cpu_utilization/proc/{pid}/statutime+stime deltainstance_php_memory_used_mb/proc/{pid}/statusVmRSSinstance_php_memory_peak_mb/proc/{pid}/statusVmHWMinstance_php_virtual_memory_mb/proc/{pid}/statusVmSizeinstance_php_thread_count/proc/{pid}/statusThreadsinstance_php_open_fd_count/proc/{pid}/fdcountLinux only (
/proc); requires a forked reporter worker (reporter_type=grpcorkafka, notstandalone).New INI settings:
skywalking_agent.metrics_enable— default On when the agent is active (aligned with Python PVM / Ruby runtime meters); setOffto disable.skywalking_agent.metrics_report_period— default 30 seconds.skywalking_agent.enableremains Off by default (unchanged PHP agent behavior). New deployments without the agent enabled are unaffected.Bump workspace version to 1.2.0; documentation in
docs/en/and README.CI: explicit
ppa:ondrej/phpbeforephp-*-fpminstall inrust.yml(see Appendix; independent of PHM logic).Design notes
executehooks.getppid())./proc/{pid}/statdelta over the report period; the first interval emits no CPU point (baseline sample required).collectBatch(short-lived gRPC stream), matching periodic reporting rather than a long-lived meter stream.Development branch CI verification (commit
e6a279f)All pipelines passed on the fork:
Testing
tests/e2e.rs):-d metrics_enable=On,-d metrics_report_period=5./dataValidatevalidatestests/data/expected_context.yaml.serviceName: skywalking-agent-test-1(ge 0/ge 1for thread and FD counts)./procsamples from the CI Ubuntu runner's php-fpm worker, not mocked.Related work (separate PRs, not in this change)
meter-analyzer-config/php-runtime.yaml, PHP e2e — open after this merges; pinSW_AGENT_PHP_COMMITto the apache merge SHA.Agent-first order matches Python PVM and Go runtime meter: no proto change; safe to merge agent before OAP/UI.
References
Test plan
e6a279f)enable=On: six meters reported to OAP (after OAP PR merges)metrics_enable=Off: no PHM meters; tracing unchangedstandalonereporter: no PHM (documented)/procsamplingNotes for reviewers
enabledefault Off means no behavior change until the agent is explicitly enabled.worker/src/phm.rs—/procparsing and CPU delta mathworker/src/reporter/meter_batch.rs— batch flush and retry semanticsmaster.Appendix: CI fix — explicit
ppa:ondrej/phpin Setup php-fpmBackground
On the fork, the Rust workflow began failing at Setup php-fpm for Linux after GitHub runner image updates (logs show
ubuntu24/20260615.205), with errors such as:ubuntu24/20260607.184ubuntu24/20260615.205Some runs on 6/17 failed overall due to
cargo fmt/clippy, unrelated to php-fpm.Other workflows at the same time: License and PECL still passed; Rust failed at apt install before reaching
cargo clippy/cargo test.Apache upstream (inferred, not re-tested): Rust last succeeded on master push around 2026-03-12; no master pushes since. Re-running the same
rust.ymlon master or a PR today may hit the same fpm issue (same workflow, rollingubuntu-24.04image). This is an inference only.Relation to PHM: Even without PHM, new runners may exhibit this CI failure. The Rust CI fix and PHM business logic are independent.
Likely causes (analysis)
runs-on: ubuntu-24.04is a rolling label; apt environment can change after image rollout (~6/15).php*-fpmpackages may not be in Ubuntu 24.04 default repos and typically requireppa:ondrej/php(same family asshivammathur/setup-phpfor CLI).setup-phpthenapt install php${version}-fpm, implicitly assuming ondrej is ready.setup-phpconfigures ondrej for CLI, but the fpm step did not explicitly refresh apt sources; worked on older runners, may fail on newer ones.Change in this PR
Before installing fpm, Setup php-fpm for Linux now runs:
Then the existing
apt install php${version}-fpmand symlink.Intent: Not introducing ondrej for the first time, but making fpm install explicitly depend on ondrej + refreshed apt index, instead of implicit state left by
setup-php.Unchanged: matrix, Rust toolchain, docker compose, cargo tests; each job still installs only one matrix PHP fpm version.
Full step as committed:
Verification (fork)
After push to
feature/php-metrics-dev(commite6a279f):Single fork validation; does not guarantee all upstream runner regions/times. Not proven to be the only possible fix.
Summary for review
php*-fpminstall fails on new runners; fork change restores CIsetup-phppractice? Reproduce on upstream first?