Skip to content

feat: add metrics_optional_domains and metrics_optional_packages for users to configure optional domains#245

Merged
richm merged 1 commit intolinux-system-roles:mainfrom
richm:metrics_pcp_optional_agents
Jul 24, 2025
Merged

feat: add metrics_optional_domains and metrics_optional_packages for users to configure optional domains#245
richm merged 1 commit intolinux-system-roles:mainfrom
richm:metrics_pcp_optional_agents

Conversation

@richm
Copy link
Copy Markdown
Collaborator

@richm richm commented Jul 17, 2025

Feature: The metrics role has access to many data collection domains.
The metrics role will automatically enable
domains for features being managed e.g. the metrics role will automatically
enable the elasticsearch domain if metrics_from_elasticsearch: true. The
new parameter metrics_optional_domains allows the user to provide a list
of additional domains to enable. The new parameter metrics_optional_packages
allows the user to provide a list of additional packages that may be required
to support the additional domains.

Reason: Users should be able to enable domains other than the ones for
features managed by the metrics role, and provide additional packages needed
to enable those domains.

Result: Users can enable the domains they need to use, and specify additional
packages required for those domains.

Signed-off-by: Rich Megginson rmeggins@redhat.com

Summary by Sourcery

Introduce support for user-configurable optional PCP agents in the metrics role, allowing users to enable additional agents beyond those automatically managed by the role.

New Features:

  • Add metrics_optional_domains variable to allow users to specify additional domains to enable.

Enhancements:

  • Update task logic to enable both automatically managed and user-specified PCP agents.

Documentation:

  • Document the new metrics_optional_domains variable and its usage in README.md.

Tests:

  • Add test coverage for enabling optional PCP agents via metrics_optional_domains.

@richm richm requested a review from natoscott as a code owner July 17, 2025 00:59
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Jul 17, 2025

Reviewer's Guide

This pull request adds a new parameter, metrics_pcp_optional_agents, enabling users to specify additional PCP agents to be enabled beyond those automatically managed by the metrics role, with corresponding updates to documentation, defaults, implementation logic, and tests.

File-Level Changes

Change Details Files
Introduced the metrics_pcp_optional_agents parameter to allow users to specify additional PCP agents to enable.
  • Added documentation for metrics_pcp_optional_agents in README.md, including usage example.
  • Defined metrics_pcp_optional_agents with a default empty list in defaults/main.yml.
  • Modified the PCP agent configuration logic to merge user-specified optional agents with automatically managed agents in tasks/main.yml.
  • Updated test playbook to include metrics_pcp_optional_agents in tests/tests_verify_fullstack.yml.
README.md
defaults/main.yml
tasks/main.yml
tests/tests_verify_fullstack.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @richm - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@richm richm force-pushed the metrics_pcp_optional_agents branch from 4552371 to ec93ce1 Compare July 17, 2025 01:01
README.md Outdated
If you want to remove policy, you will need to use the selinux system
role directly.

### metrics_pcp_optional_agents: []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not included the backend project name ('pcp') directly in any variables before, don't think we should start now. How about 'metrics_optional_domains' instead here?

@richm richm force-pushed the metrics_pcp_optional_agents branch from ec93ce1 to 0437e00 Compare July 17, 2025 13:09
@richm richm changed the title feat: add metrics_pcp_optional_agents for users to configure optional agents feat: add metrics_optional_domains for users to configure optional domains Jul 17, 2025
@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 17, 2025

@natoscott 1) What optional agents should this test with? Ideally, agents that are available in el7 - el10 and fedora. 2) How to verify that the optional agents are active/enabled?

@richm richm force-pushed the metrics_pcp_optional_agents branch from 0437e00 to 8a0f902 Compare July 17, 2025 13:15
@natoscott
Copy link
Copy Markdown
Collaborator

  1. What optional agents should this test with? Ideally, agents that are available in el7 - el10 and fedora.

The apache agent (pcp-pmda-apache rpm) has been around forever and is never default installed in pmcd.conf - it is also tolerant of Apache not running, so seems like it might be a good option.

  1. How to verify that the optional agents are active/enabled?

A good way is to ask pmcd via:

$ pminfo -f pmcd.agent.status | grep apache
    inst [68 or "apache"] value 0

Alternatively, can request values from the agent - this is problematic for CI though because it requires the source of metrics for that domain (apache) to be available, accessible and configured - which is not necessarily the case. From memory there are existing tests using pmcd.agent.status so I recommend that approach.

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 21, 2025

@natoscott one catch is that I had to add this:

  vars:
    ...
    pcp_optional_packages: [pcp-pmda-apache]

which works - but it begs the question - how will customers install the custom packages required for metrics_optional_domains? Do we need to add a metrics_optional_packages which sets pcp_optional_packages?

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 21, 2025

@natoscott on an unrelated topic - any idea why tests_verify_from_spark.yml test is failing? I can reproduce locally - this is supposed to work: pmprobe -I openmetrics.control.status but I get openmetrics.control.status -12357 Unknown metric name and pmprobe -I | grep openmetrics shows nothing.

@natoscott
Copy link
Copy Markdown
Collaborator

@natoscott on an unrelated topic - any idea why tests_verify_from_spark.yml test is failing? I can reproduce locally - this is supposed to work: pmprobe -I openmetrics.control.status but I get openmetrics.control.status -12357 Unknown metric name and pmprobe -I | grep openmetrics shows nothing.

Hi @richm - had a look through but couldn't see a root cause - the check has 10x 1sec retry logic, which is definitely needed to give pmdaopenmetrics time to start. The next thing to do would be to check both /var/log/pcp/pmcd/{pmcd,openmetrics}.log files for additional clues.

I'm guessing it just takes too long to get started, though on my local machine it completes in less than 1sec - the lack of any output from pminfo/pmprobe at all for the openmetrics.control.status metric suggests the PMDA is not yet completely setup. Another check that could be added is the pmcd.agent.status based check from check_from_elasticsearch.yml (with .find("openmetrics") instead) - this will let us know that pmcd has started the PMDA at least.

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 22, 2025

@natoscott on an unrelated topic - any idea why tests_verify_from_spark.yml test is failing? I can reproduce locally - this is supposed to work: pmprobe -I openmetrics.control.status but I get openmetrics.control.status -12357 Unknown metric name and pmprobe -I | grep openmetrics shows nothing.

Hi @richm - had a look through but couldn't see a root cause - the check has 10x 1sec retry logic, which is definitely needed to give pmdaopenmetrics time to start.

I tried retries: 120 - same result - does not work

The next thing to do would be to check both /var/log/pcp/pmcd/{pmcd,openmetrics}.log files for additional clues.

I don't see openmetrics.log

I don't see any errors or other clues in pmcd.log

I'm guessing it just takes too long to get started, though on my local machine it completes in less than 1sec - the lack of any output from pminfo/pmprobe at all for the openmetrics.control.status metric suggests the PMDA is not yet completely setup. Another check that could be added is the pmcd.agent.status based check from check_from_elasticsearch.yml (with .find("openmetrics") instead) - this will let us know that pmcd has started the PMDA at least.

# pmprobe -I pmcd.agent.status
pmcd.agent.status 11 "root" "pmcd" "proc" "pmproxy" "xfs" "linux" "nfsclient" "mmv" "kvm" "jbd2" "dm"

I'm baffled . . .

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 22, 2025

I'm testing locally like this:

tox -e qemu-ansible-core-2.19 -- --image-name centos-10 --log-level debug -- tests/tests_verify_from_spark.ym

pcp-pmda-openmetrics-6.3.7-5.el10.x86_64

@natoscott
Copy link
Copy Markdown
Collaborator

@richm I think we're missing something to stitch the needed agent in at the top level, e.g.

$ git diff
tasks/main.yml
--- /tmp/git-blob-xTgL42/main.yml       2025-07-22 11:23:16.334441108 +1000
+++ tasks/main.yml      2025-07-22 11:23:15.192109410 +1000
@@ -31,6 +31,11 @@
     __metrics_domains: "{{ __metrics_domains + ['elasticsearch'] }}"
   when: metrics_from_elasticsearch | d(false) | bool
 
+- name: Add OpenMetrics to metrics domain list
+  set_fact:
+    __metrics_domains: "{{ __metrics_domains + ['openmetrics'] }}"
+  when: metrics_from_spark | d(false) | bool
+
 - name: Add SQL Server to metrics domain list
   set_fact:
     __metrics_domains: "{{ __metrics_domains + ['mssql'] }}"

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 22, 2025

#246

@natoscott
Copy link
Copy Markdown
Collaborator

@natoscott one catch is that I had to add this:
[...]
which works - but it begs the question - how will customers install the custom packages required for metrics_optional_domains? Do we need to add a metrics_optional_packages which sets pcp_optional_packages?

Yep, good point - I think that will be needed here.

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 23, 2025

@natoscott one catch is that I had to add this:
[...]
which works - but it begs the question - how will customers install the custom packages required for metrics_optional_domains? Do we need to add a metrics_optional_packages which sets pcp_optional_packages?

Yep, good point - I think that will be needed here.

Done

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 23, 2025

[citest]

…users to configure optional domains

Feature: The metrics role has access to many data collection domains.
The metrics role will automatically enable
domains for features being managed e.g. the metrics role will automatically
enable the `elasticsearch` domain if `metrics_from_elasticsearch: true`.  The
new parameter `metrics_optional_domains` allows the user to provide a list
of additional domains to enable.  The new parameter `metrics_optional_packages`
allows the user to provide a list of additional packages that may be required
to support the additional domains.

Reason: Users should be able to enable domains other than the ones for
features managed by the metrics role, and provide additional packages needed
to enable those domains.

Result: Users can enable the domains they need to use, and specify additional
packages required for those domains.

Signed-off-by: Rich Megginson <rmeggins@redhat.com>
@richm richm force-pushed the metrics_pcp_optional_agents branch from 4a4f522 to 4894af9 Compare July 24, 2025 14:34
@richm richm changed the title feat: add metrics_optional_domains for users to configure optional domains feat: add metrics_optional_domains and metrics_optional_packages for users to configure optional domains Jul 24, 2025
@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 24, 2025

[citest]

@richm
Copy link
Copy Markdown
Collaborator Author

richm commented Jul 24, 2025

@natoscott ok to merge?

@natoscott
Copy link
Copy Markdown
Collaborator

@natoscott ok to merge?

LGTM.

@richm richm merged commit 479633c into linux-system-roles:main Jul 24, 2025
19 of 28 checks passed
@richm richm deleted the metrics_pcp_optional_agents branch July 24, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants