Skip to content

Add PHP Health Metrics (PHM) with /proc-based worker collection#145

Merged
jmjoy merged 2 commits into
apache:masterfrom
songzhendong:feature/php-metrics-dev
Jun 23, 2026
Merged

Add PHP Health Metrics (PHM) with /proc-based worker collection#145
jmjoy merged 2 commits into
apache:masterfrom
songzhendong:feature/php-metrics-dev

Conversation

@songzhendong

Copy link
Copy Markdown
Contributor

This description is provided for review reference. If later verification differs from what is stated here, corrections and feedback are welcome.

Summary

  • Add PHP Health Metrics (PHM): the reporter worker samples the parent PHP-FPM process via Linux /proc and reports six instance_php_* meters through native MeterReportService / collectBatch.

  • Meters (aligned with OAP php-runtime.yaml and Horizon UI widgets):

    Agent meter name Source
    instance_php_process_cpu_utilization /proc/{pid}/stat utime+stime delta
    instance_php_memory_used_mb /proc/{pid}/status VmRSS
    instance_php_memory_peak_mb /proc/{pid}/status VmHWM
    instance_php_virtual_memory_mb /proc/{pid}/status VmSize
    instance_php_thread_count /proc/{pid}/status Threads
    instance_php_open_fd_count /proc/{pid}/fd count
  • Linux only (/proc); requires a forked reporter worker (reporter_type=grpc or kafka, not standalone).

  • New INI settings:

    • skywalking_agent.metrics_enable — default On when the agent is active (aligned with Python PVM / Ruby runtime meters); set Off to disable.
    • skywalking_agent.metrics_report_period — default 30 seconds.
  • skywalking_agent.enable remains Off by default (unchanged PHP agent behavior). New deployments without the agent enabled are unaffected.

  • Bump workspace version to 1.2.0; documentation in docs/en/ and README.

  • CI: explicit ppa:ondrej/php before php-*-fpm install in rust.yml (see Appendix; independent of PHM logic).

Design notes

  • PHM uses the same gRPC transport as trace reporting; meters are not collected in PHP execute hooks.
  • Collector runs in the forked worker subprocess; target PID is the parent PHP-FPM worker (getppid()).
  • CPU utilization uses /proc/{pid}/stat delta over the report period; the first interval emits no CPU point (baseline sample required).
  • Meters are flushed via collectBatch (short-lived gRPC stream), matching periodic reporting rather than a long-lived meter stream.

Development branch CI verification (commit e6a279f)

All pipelines passed on the fork:

Workflow Result
Rust ✅ fmt, clippy, build, e2e (PHP 7.2–8.5 matrix + kafka-reporter)
License
PECL

Testing

  • Agent e2e (tests/e2e.rs):
    • FPM [docs] Update README #1 only: -d metrics_enable=On, -d metrics_report_period=5.
    • After plugin/trace requests and an 8s wait (≥2× report period), mock collector /dataValidate validates tests/data/expected_context.yaml.
    • Asserts six meters for serviceName: skywalking-agent-test-1 (ge 0 / ge 1 for thread and FD counts).
    • PHM values are real /proc samples from the CI Ubuntu runner's php-fpm worker, not mocked.

Related work (separate PRs, not in this change)

  • OAP: meter-analyzer-config/php-runtime.yaml, PHP e2e — open after this merges; pin SW_AGENT_PHP_COMMIT to the apache merge SHA.
  • UI: PHM widgets on General → Instance dashboard in skywalking-horizon-ui.

Agent-first order matches Python PVM and Go runtime meter: no proto change; safe to merge agent before OAP/UI.

References

Test plan

  • Fork CI: Rust / License / PECL green (e6a279f)
  • Upstream CI green after PR opened
  • Linux + enable=On: six meters reported to OAP (after OAP PR merges)
  • metrics_enable=Off: no PHM meters; tracing unchanged
  • standalone reporter: no PHM (documented)
  • Non-Linux: documented Linux-only; no /proc sampling

Notes for reviewers

  • Safe to merge agent first: enable default Off means no behavior change until the agent is explicitly enabled.
  • Please focus on:
    • worker/src/phm.rs/proc parsing and CPU delta math
    • Worker lifecycle and parent PID resolution
    • worker/src/reporter/meter_batch.rs — batch flush and retry semantics
  • 17 files, +620 / −11 vs master.

Appendix: CI fix — explicit ppa:ondrej/php in Setup php-fpm

Scope: .github/workflows/rust.yml only · Independent of PHM feature code

Background

On the fork, the Rust workflow began failing at Setup php-fpm for Linux after GitHub runner image updates (logs show ubuntu24/20260615.205), with errors such as:

E: Unable to locate package php7.2-fpm
E: Package 'php7.4-fpm' has no installation candidate
When Runner image Observation
2026-06-17 ubuntu24/20260607.184 Setup php-fpm and PHP matrix jobs succeeded
From 2026-06-19 ubuntu24/20260615.205 Above apt errors recur

Some runs on 6/17 failed overall due to cargo fmt / clippy, unrelated to php-fpm.

Other workflows at the same time: License and PECL still passed; Rust failed at apt install before reaching cargo clippy / cargo test.

Apache upstream (inferred, not re-tested): Rust last succeeded on master push around 2026-03-12; no master pushes since. Re-running the same rust.yml on master or a PR today may hit the same fpm issue (same workflow, rolling ubuntu-24.04 image). This is an inference only.

Relation to PHM: Even without PHM, new runners may exhibit this CI failure. The Rust CI fix and PHM business logic are independent.

Likely causes (analysis)

  1. runs-on: ubuntu-24.04 is a rolling label; apt environment can change after image rollout (~6/15).
  2. Matrix covers PHP 7.2–8.5; matching php*-fpm packages may not be in Ubuntu 24.04 default repos and typically require ppa:ondrej/php (same family as shivammathur/setup-php for CLI).
  3. Current flow runs setup-php then apt install php${version}-fpm, implicitly assuming ondrej is ready. setup-php configures ondrej for CLI, but the fpm step did not explicitly refresh apt sources; worked on older runners, may fail on newer ones.
  4. Not caused by PHM/agent changes; transient ondrej/apt issues cannot be fully ruled out.

Change in this PR

Before installing fpm, Setup php-fpm for Linux now runs:

sudo apt-get install -y software-properties-common
sudo add-apt-repository -y ppa:ondrej/php
sudo apt-get update

Then the existing apt install php${version}-fpm and symlink.

Intent: Not introducing ondrej for the first time, but making fpm install explicitly depend on ondrej + refreshed apt index, instead of implicit state left by setup-php.

Unchanged: matrix, Rust toolchain, docker compose, cargo tests; each job still installs only one matrix PHP fpm version.

Full step as committed:

- name: Setup php-fpm for Linux
  if: matrix.os == 'ubuntu-24.04'
  run: |
    sudo apt-get update
    sudo apt-get install -y software-properties-common
    sudo add-apt-repository -y ppa:ondrej/php
    sudo apt-get update
    sudo apt-get install -y php${{ matrix.flag.php_version }}-fpm
    sudo ln -sf /usr/sbin/php-fpm${{ matrix.flag.php_version }} /usr/sbin/php-fpm

Verification (fork)

After push to feature/php-metrics-dev (commit e6a279f):

Workflow Result
Rust ✅ (incl. PHP 7.2 / 7.4 / 8.5 matrix)
License
PECL

Single fork validation; does not guarantee all upstream runner regions/times. Not proven to be the only possible fix.

Summary for review

Dimension Note
Symptom php*-fpm install fails on new runners; fork change restores CI
Hypothesis Runner update + implicit ondrej dependency; upstream re-run may reproduce
Review focus Suitable for upstream? Consistent with setup-php practice? Reproduce on upstream first?

@wu-sheng wu-sheng requested review from Copilot and jmjoy June 21, 2026 08:22
@wu-sheng wu-sheng added this to the 1.2.0 milestone Jun 21, 2026
@wu-sheng wu-sheng added the enhancement New feature or request label Jun 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds PHP Health Metrics (PHM) collection and reporting from the forked reporter worker, using Linux /proc to sample the parent PHP-FPM process and exporting the data via SkyWalking’s native meter protocol.

Changes:

  • Add /proc-based PHM collector and wire it into the worker lifecycle/config.
  • Add gRPC collectBatch-based meter batching path and filter meter items out of the existing trace/log collect stream.
  • Update e2e expectations/docs and adjust Rust CI to reliably install php*-fpm via ppa:ondrej/php on Ubuntu 24.04.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
worker/src/reporter/reporter_grpc.rs Spawns meter batch reporter and wraps consumer to divert meters to batch path.
worker/src/reporter/mod.rs Registers new reporter submodules.
worker/src/reporter/meter_filter.rs Filters CollectItem::Meter out of main stream and forwards to meter batch channel.
worker/src/reporter/meter_batch.rs Implements collect_batch flush + bounded retry/retention logic for meters.
worker/src/phm.rs Implements Linux /proc sampling for PHM meters and reports them into the worker channel.
worker/src/lib.rs Adds PHM config to worker config and starts collector alongside heartbeat.
src/worker.rs Builds PHM worker configuration based on INI settings.
src/module.rs Adds lazy INI reads for new PHM settings and ensures logger uses cloned reporter.
src/lib.rs Registers new INI settings for PHM enablement and report period.
tests/common/mod.rs Enables PHM in one FPM fixture instance with shorter reporting period.
tests/e2e.rs Extends wait time to allow PHM meters to be reported before validation.
tests/data/expected_context.yaml Adds expected meter items assertions for PHM.
docs/en/setup/service-agent/php-agent/README.md Documents PHM feature, platform constraints, and meter list.
docs/en/configuration/ini-settings.md Documents new INI settings for PHM.
README.md Mentions PHM in the project description.
Cargo.toml Bumps workspace version to 1.2.0.
.github/workflows/rust.yml Adds explicit ondrej PPA setup before installing php*-fpm on Ubuntu 24.04.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/worker.rs Outdated
Comment on lines +84 to +93
phm: if *METRICS_ENABLE {
Some(PhmConfiguration {
service_name: SERVICE_NAME.clone(),
service_instance: SERVICE_INSTANCE.clone(),
report_period_secs: *METRICS_REPORT_PERIOD,
php_process_pid: libc::getpid() as i32,
})
} else {
None
},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Addressed in 5ed3fb9: phm_configuration() now stores the parent PHP-FPM PID via getppid() for the fallback path, and PHM is only enabled on Linux when metrics_enable is On.

Comment thread src/lib.rs
"".to_string(),
Policy::System,
);
module.add_ini(SKYWALKING_AGENT_METRICS_ENABLE, true, Policy::System);

@songzhendong songzhendong Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Addressed in 5ed3fb9: metrics_enable now defaults to true on Linux and false on other platforms via #[cfg(target_os = linux)] on the INI registration.

Comment thread docs/en/configuration/ini-settings.md Outdated
| skywalking_agent.instance_name | Instance name. You can set `${HOSTNAME}`, refer to [Example #1](https://www.php.net/manual/en/install.fpm.configuration.php) | |
| skywalking_agent.standalone_socket_path | Unix domain socket file path of standalone skywalking php worker. Only available when `reporter_type` is `standalone`. | |
| skywalking_agent.psr_logging_level | The log level reported to SkyWalking, based on PSR-3, one of `Off`, `Debug`, `Info`, Notice`, Warning`, Error`, Critical`, Alert`, Emergency`. | Off |
| skywalking_agent.metrics_enable | Enable PHP Health Metrics (PHM) meter reporting via native MeterReportService. **Linux only** (requires `/proc`). Enabled by default when the agent is active; set to `Off` to disable. Reports six process meters: CPU utilization, memory used/peak, virtual memory, thread count, and open FD count. See [PHP agent README](../setup/service-agent/php-agent/README.md#php-health-metrics-phm). | On |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 5ed3fb9: the default value column now reads On (Linux); Off (other), matching the platform-specific INI default.

Comment thread Cargo.toml

[workspace.package]
version = "1.1.0"
version = "1.2.0"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to be reflected in Cargo.lock.

Comment thread worker/src/reporter/reporter_grpc.rs Outdated
Comment on lines +41 to +54
let (meter_tx, meter_rx) = mpsc::channel(128);
let meter_channel = channel.clone();
let meter_authentication = config.authentication.clone();
tokio::spawn(async move {
if let Err(err) =
meter_batch::run_meter_batch_reporter(meter_channel, meter_authentication, meter_rx)
.await
{
warn!(?err, "Meter batch reporter failed");
}
});

let consumer = MeterFilteringConsumer::new(consumer, meter_tx);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't implement a separate gRPC metrics reporting logic. Instead, we should use the skywalking::metrics module.

Example: https://github.com/apache/skywalking-rust/blob/master/examples/simple_metric_report.rs.

Registering a Gauge metric in the init method of src/module.rs should meet our requirements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — I agree we should use skywalking::metrics rather than a custom gRPC meter batch implementation.

@songzhendong

Copy link
Copy Markdown
Contributor Author

I've pushed a follow-up commit (aec9c01) that addresses the skywalking::metrics / Metricer refactor and a few CI/doc items.

Metricer refactor (review feedback)

Per your suggestion, PHM no longer uses a custom meter gRPC batch path. The changes:

  • Gauge registration + Metricer::boot() happen during extension init (module::init()src/phm_runtime.rs), not in the forked reporter worker.
  • Reporting goes through the standard skywalking::metrics::Metricer + socket Reporter channel; the forked gRPC worker forwards meter data via the normal reporter path.
  • Removed worker/src/reporter/meter_batch.rs and meter_filter.rs.

PHM sampling model

PHM samples the current PHP process via /proc (getpid()), in a dedicated collector thread inside the PHP process. This differs from the earlier getppid() parent-PID fallback in 5ed3fb9: with Metricer booted in the extension process, we sample the process that owns the Gauges (typically a php-fpm pool worker). Trace reporting is unchanged.

PHM is not enabled when reporter_type = standalone (documented in the PHP agent README).

CI stability

Rust workflow e2e setup now retries docker compose up -d --wait up to 3 times (10s backoff) to reduce flakes from Docker Hub / daemon readiness. Test teardown uses SIGTERM → 5s timeout → SIGKILL; PHM is enabled only on fpm-test-1 in integration tests.

Docs

README + ini-settings updated to match the Metricer architecture.

Comment thread src/module.rs Outdated

let reporter = Arc::new(Reporter::new(&*SOCKET_FILE_PATH));

init_phm_metrics(reporter.clone());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module::init method should not contain any thread creation operations; that was an error in my previous hint.
You can refer to report_properties_and_keep_alive, as its heartbeat reporting logic is very similar to reporting metrics.
Please move the skywalking::metrics related operations into start_worker.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Report six instance_php_* process meters on Linux via /proc sampling in the
forked reporter worker. Use skywalking::metrics::Metricer (booted from
start_worker alongside report_properties_and_keep_alive, not module::init);
sample the parent PHP process via getppid(). PHM defaults On on Linux;
disabled for standalone reporter. Remove custom meter gRPC path; stabilize
e2e CI (compose retry, PHM on fpm-test-1 only).
@songzhendong songzhendong force-pushed the feature/php-metrics-dev branch from aec9c01 to 4a75da4 Compare June 23, 2026 03:29
Comment thread worker/src/phm.rs Outdated
let mut metricer = Metricer::new(config.service_name, config.service_instance, reporter);
metricer.set_report_interval(report_period);
register_gauges(&mut metricer, samples);
let _booting = metricer.boot();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot allow the return value of metricer.boot() to be dropped.

Comment thread worker/src/phm.rs Outdated
Comment on lines +112 to +123
if let Some(mb) = read_status_kib(pid, "VmRSS") {
PhmSamples::store(&samples.memory_used_mb, mb);
}
if let Some(mb) = read_status_kib(pid, "VmHWM") {
PhmSamples::store(&samples.memory_peak_mb, mb);
}
if let Some(mb) = read_status_kib(pid, "VmSize") {
PhmSamples::store(&samples.virtual_memory_mb, mb);
}
if let Some(count) = read_status_count(pid, "Threads") {
PhmSamples::store(&samples.thread_count, count as f64);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These operations repeatedly read the /proc/{pid}/status file and need to be consolidated into a single read.

Return Booting from boot_phm_metrics and keep it alive for the worker
lifetime so metric reporting is not shut down on drop. Consolidate
VmRSS/VmHWM/VmSize/Threads sampling into a single /proc/{pid}/status read.
@songzhendong songzhendong force-pushed the feature/php-metrics-dev branch from c587c2d to 2f88b7e Compare June 23, 2026 07:10
@songzhendong

Copy link
Copy Markdown
Contributor Author

both points are addressed in the latest commit.

1. Keep Metricer::boot() alive

boot_phm_metrics() now returns Booting instead of dropping it. In start_worker, we hold it as _phm_booting for the same lifetime as run_reporter, so the background meter reporting task is not shut down when boot() returns.

2. Single read of /proc/{pid}/status

VmRSS, VmHWM, VmSize, and Threads are now parsed from one read_proc_status() call per sample, rather than opening /proc/{pid}/status four times.
Please take another look when convenient.

@jmjoy jmjoy merged commit de311c9 into apache:master Jun 23, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants