Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9272bfe
minor test refactor
SylvainJuge Sep 20, 2024
6f20784
enhance tests
SylvainJuge Sep 24, 2024
d1a195b
add more assertions in tests
SylvainJuge Sep 25, 2024
3f2f151
add state mapping + tests
SylvainJuge Sep 30, 2024
5973b01
impl + test
SylvainJuge Sep 30, 2024
4fd10aa
Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…
SylvainJuge Sep 30, 2024
6155206
spotless
SylvainJuge Sep 30, 2024
af37bb4
thou shall lint
SylvainJuge Oct 1, 2024
6551200
minor add @Nullable
SylvainJuge Oct 1, 2024
376acc5
fix state metric detection
SylvainJuge Oct 1, 2024
679c460
lint again
SylvainJuge Oct 1, 2024
e5ff7c5
implement new syntax
SylvainJuge Oct 8, 2024
8e73f10
cleanup
SylvainJuge Oct 8, 2024
e9e8e83
Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…
SylvainJuge Oct 8, 2024
550b003
spotless
SylvainJuge Oct 8, 2024
1bb4302
add documentation
SylvainJuge Oct 9, 2024
ebf58ff
minor code refactor to use lists instead of arrays
SylvainJuge Oct 9, 2024
6c10e6d
port-review change in docs
SylvainJuge Oct 10, 2024
652ab32
switch to _ + enhance doc for 2 state
SylvainJuge Oct 14, 2024
e24b9ae
Update instrumentation/jmx-metrics/library/src/main/java/io/opentelem…
SylvainJuge Oct 16, 2024
2698be3
Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…
SylvainJuge Oct 17, 2024
e4b9fe5
restore '*' as wildcard + clarify quoting
SylvainJuge Oct 17, 2024
111c26b
Merge branch 'state-jmx-metrics' of github.com:SylvainJuge/openteleme…
SylvainJuge Oct 17, 2024
96bb6cc
restore '*' again
SylvainJuge Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions instrumentation/jmx-metrics/javaagent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,67 @@ Thus, the above definitions will create several metrics, named `my.kafka.streams

The metric descriptions will remain undefined, unless they are provided by the queried MBeans.

### State Metrics

Some JMX attributes expose current state as a non-numeric MBean attribute, in order to capture those as metrics it is recommended to use the special `state` metric type.
For example, with Tomcat connector, the `Catalina:type=Connector,port=*` MBean has `stateName` (of type `String`), we can define the following rule:

```yaml
---
rules:
- bean: Catalina:type=Connector,port=*
mapping:
stateName:
type: state
metric: tomcat.connector
metricAttribute:
port: param(port)
connector_state:
ok: STARTED
failed: [STOPPED,FAILED]
degraded: _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought of open-telemetry/opentelemetry-configuration#115

though we're already not doing this in the jmx metrics (prefering yaml brevity), so this comment should not hold up this PR

@jack-berg do you think this approach (using objects for brevity) in jmx metrics module is ok long-term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would switch to using an array of objects, for example with name and value, then we could add a third one with default attribute key with value true|false to indicate which entry would be the default instead of using a "magic value" to represent this.

```

For a given value of `port`, let's say `8080` This will capture the `tomcat.connector.state` metric of type `updowncounter` with value `0` or `1` and the `state` metric attribute will have a value in [`ok`,`failed`,`degraded`].
For every sample, 3 metrics will be captured for each value of `state` depending on the value of `stateName`:

When `stateName` = `STARTED`, we have:

- `tomcat.connector` value = `1`, attributes `port` = `8080` and `connector_state` = `ok`
- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `failed`
- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `degraded`

When `stateName` = `STOPPED` or `FAILED`, we have:

- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `ok`
- `tomcat.connector` value = `1`, attributes `port` = `8080` and `connector_state` = `failed`
- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `degraded`

For other values of `stateName`, we have:

- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `ok`
- `tomcat.connector` value = `0`, attributes `port` = `8080` and `connector_state` = `failed`
- `tomcat.connector` value = `1`, attributes `port` = `8080` and `connector_state` = `degraded`

Each state key can be mapped to one or more values of the MBean attribute using:
- a string literal or a string array
- a `_` character to provide default option and avoid enumerating all values.

Exactly one `_` value must be present in the mapping to ensure all possible values of the MBean attribute can be mapped to a state key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why this is a requirement. If there are only certain states that you are interested in, why not allow to only specify those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a requirement, but allows to ensure that state metrics are always captured consistently, this is more best practice than a technical requirement.

Doing so means that at a given time, the sum of all samples is equal to 1, which translates to "the state of the thing we observe is known at all times". I think it's worth doing that for now unless we have good examples where it can be problematic.

The example I have in mind when adding this restriction is again probably biased by current focus on Tomcat state, so I'll elaborate a bit:

Let's say we monitor 3 tomcat servers, each of them with a single connector and a state metric to get the connector state with on|off values. When consuming those metrics to create a dashboard:

Scenario 1:

  • default set to off to indicate something is not right.
  • creating a linear chart (sum) with breakdown on state will always sum to 3, and the breakdown provides an obvious clue when something is not in on state.
  • creating a pie-chart with the latest value with breakdown on state also provides a good representation of the 3 tomcat instances.
  • the end-user does not even need to know how many tomcat instances exist, you get information from the ratio of on|off in the charts and then can investigate what could be wrong if needed.
  • creating an extra dedicated unknown state could be used to gather those states as they might represent a gap in the metrics mapping.

Scenario 2:

  • state value on when connector is RUNNING, off when connector is STOPPED, other possible values are not mapped.
  • creating a linear chart (sum) with breakdown on state will sum to 3 when the state is either on or off, but will go down whenever an un-mapped state is present
  • the end-user needs to know that the "usual value is 3", and that "when it goes down to less than 3 something is unknown and maybe we have a gap in our mapping"
  • it is possible to trigger an alert by monitoring when the value goes down, but this is definitely more complex and depends mostly on the backend and metrics processing.
  • with some visualizations it's even more tricky, for example when creating a pie-chart of the latest reported values to represent the "current state", if not all instances can be represented because their state is unknown then they will be ignored. In an extreme case, if only a single instance is in on state and the two others are in an unknown state then the pie-chart could be misleading representing "100% is running and it's fine", when in practice it's "only 1/3 fine".

Given that we usually need to monitor things that are working properly and even more the ones that not working properly (which are not in an expected state), having this as a requirement makes it harder to shoot yourself in the foot by missing a state whose value comes from an implementation detail of the monitored product.


The default value indicated by `_` does not require a dedicated state key. For example, if we want to have `connector_state` metric attribute with values `on` or `off`, we can use:
```yaml
connector_state:
on: STARTED
off: [STOPPED,FAILED,_]
```
In the particular case where only two values are defined, we can simplify further by explicitly defining one state and rely on default for the other.
```yaml
connector_state:
on: STARTED
off: _
```

### General Syntax

Here is the general description of the accepted configuration file syntax. The whole contents of the file is case-sensitive, with exception for `type` as described in the table below.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ AttributeInfo getAttributeInfo(MBeanServerConnection connection, ObjectName obje

// Verify correctness of configuration by attempting to extract the metric value.
// The value will be discarded, but its type will be checked.
Object sampleValue = extractAttributeValue(connection, objectName, logger);
Object sampleValue = getSampleValue(connection, objectName);

// Only numbers can be used to generate metric values
if (sampleValue instanceof Number) {
Expand Down Expand Up @@ -194,6 +194,11 @@ AttributeInfo getAttributeInfo(MBeanServerConnection connection, ObjectName obje
return null;
}

@Nullable
protected Object getSampleValue(MBeanServerConnection connection, ObjectName objectName) {
return extractAttributeValue(connection, objectName, logger);
}
Comment on lines +198 to +200
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] this allows to bypass the sample value check for the "artificial" attribute value that we need for the state metric.


/**
* Extracts the specified attribute value. In case the value is a CompositeData, drills down into
* it to find the correct singleton value (usually a Number or a String).
Expand All @@ -203,7 +208,7 @@ AttributeInfo getAttributeInfo(MBeanServerConnection connection, ObjectName obje
* pattern
* @param logger the logger to use, may be null. Typically we want to log any issues with the
* attributes during MBean discovery, but once the attribute is successfully detected and
* confirmed to be eligble for metric evaluation, any further attribute extraction
* confirmed to be eligible for metric evaluation, any further attribute extraction
* malfunctions will be silent to avoid flooding the log.
* @return the attribute value, if found, or {@literal null} if an error occurred
*/
Expand Down Expand Up @@ -253,7 +258,8 @@ private Object extractAttributeValue(MBeanServerConnection connection, ObjectNam
}

@Nullable
Number extractNumericalAttribute(MBeanServerConnection connection, ObjectName objectName) {
protected Number extractNumericalAttribute(
MBeanServerConnection connection, ObjectName objectName) {
Object value = extractAttributeValue(connection, objectName);
if (value instanceof Number) {
return (Number) value;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@

package io.opentelemetry.instrumentation.jmx.engine;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import javax.annotation.Nullable;
import javax.management.MalformedObjectNameException;
import javax.management.ObjectName;
import javax.management.QueryExp;

Expand All @@ -16,7 +20,7 @@
public class BeanGroup {
// How to specify the MBean(s)
@Nullable private final QueryExp queryExp;
private final ObjectName[] namePatterns;
private final List<ObjectName> namePatterns;

/**
* Constructor for BeanGroup.
Expand All @@ -25,17 +29,29 @@ public class BeanGroup {
* @param namePatterns an array of ObjectNames used to look for MBeans; usually they will be
* patterns. If multiple patterns are provided, they work as logical OR.
*/
public BeanGroup(@Nullable QueryExp queryExp, ObjectName... namePatterns) {
private BeanGroup(@Nullable QueryExp queryExp, List<ObjectName> namePatterns) {
this.queryExp = queryExp;
this.namePatterns = namePatterns;
}

public static BeanGroup forSingleBean(String bean) throws MalformedObjectNameException {
return new BeanGroup(null, Collections.singletonList(new ObjectName(bean)));
}

public static BeanGroup forBeans(List<String> beans) throws MalformedObjectNameException {
List<ObjectName> list = new ArrayList<>();
for (String name : beans) {
list.add(new ObjectName(name));
}
return new BeanGroup(null, list);
}

@Nullable
QueryExp getQueryExp() {
return queryExp;
}

ObjectName[] getNamePatterns() {
List<ObjectName> getNamePatterns() {
return namePatterns;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ public MetricAttribute(String name, MetricAttributeExtractor extractor) {
this.extractor = extractor;
}

public boolean isStateAttribute() {
return extractor == null;
}

public String getAttributeName() {
return name;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

package io.opentelemetry.instrumentation.jmx.engine;

import java.util.List;

/**
* A class providing a complete definition on how to create an Open Telemetry metric out of the JMX
* system: how to extract values from MBeans and how to model, name and decorate them with
Expand Down Expand Up @@ -74,7 +76,7 @@ public class MetricDef {
private final BeanGroup beans;

// Describes how to get the metric values and their attributes, and how to report them
private final MetricExtractor[] metricExtractors;
private final List<MetricExtractor> metricExtractors;

/**
* Constructor for MetricDef.
Expand All @@ -84,7 +86,7 @@ public class MetricDef {
* MetricExtractor is provided, they should use unique metric names or unique metric
* attributes
*/
public MetricDef(BeanGroup beans, MetricExtractor... metricExtractors) {
public MetricDef(BeanGroup beans, List<MetricExtractor> metricExtractors) {
this.beans = beans;
this.metricExtractors = metricExtractors;
}
Expand All @@ -93,7 +95,7 @@ BeanGroup getBeanGroup() {
return beans;
}

MetricExtractor[] getMetricExtractors() {
List<MetricExtractor> getMetricExtractors() {
return metricExtractors;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

package io.opentelemetry.instrumentation.jmx.engine;

import java.util.List;
import javax.annotation.Nullable;

/**
Expand All @@ -22,14 +23,14 @@ public class MetricExtractor {
private final BeanAttributeExtractor attributeExtractor;

// Defines the Measurement attributes to be used when reporting the metric value.
private final MetricAttribute[] attributes;
private final List<MetricAttribute> attributes;

@Nullable private volatile DetectionStatus status;

public MetricExtractor(
BeanAttributeExtractor attributeExtractor,
MetricInfo metricInfo,
MetricAttribute... attributes) {
List<MetricAttribute> attributes) {
this.attributeExtractor = attributeExtractor;
this.metricInfo = metricInfo;
this.attributes = attributes;
Expand All @@ -43,7 +44,7 @@ BeanAttributeExtractor getMetricValueExtractor() {
return attributeExtractor;
}

MetricAttribute[] getAttributes() {
List<MetricAttribute> getAttributes() {
return attributes;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ public class MetricInfo {
public enum Type {
COUNTER,
UPDOWNCOUNTER,
GAUGE
GAUGE,
/** state metric captured as updowncounter */
STATE
}

// How to report the metric using OpenTelemetry API
Expand All @@ -44,21 +46,21 @@ public MetricInfo(
this.type = type == null ? Type.GAUGE : type;
}

String getMetricName() {
public String getMetricName() {
return metricName;
}

@Nullable
String getDescription() {
public String getDescription() {
return description;
}

@Nullable
String getUnit() {
public String getUnit() {
return unit;
}

Type getType() {
public Type getType() {
return type;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,13 @@ void enrollExtractor(
}
logger.log(INFO, "Created Gauge for {0}", metricName);
}
break;
// CHECKSTYLE:OFF
case STATE:
{
// CHECKSTYLE:ON
throw new IllegalStateException("state metrics should not be registered");
}
Comment on lines +125 to +129
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] this is needed to have the switch statement cover all cases.

}
}

Expand Down Expand Up @@ -173,9 +180,8 @@ static Consumer<ObservableLongMeasurement> longTypeCallback(MetricExtractor extr
*/
static Attributes createMetricAttributes(
MBeanServerConnection connection, ObjectName objectName, MetricExtractor extractor) {
MetricAttribute[] metricAttributes = extractor.getAttributes();
AttributesBuilder attrBuilder = Attributes.builder();
for (MetricAttribute metricAttribute : metricAttributes) {
for (MetricAttribute metricAttribute : extractor.getAttributes()) {
String attributeValue = metricAttribute.acquireAttributeValue(connection, objectName);
if (attributeValue != null) {
attrBuilder = attrBuilder.put(metricAttribute.getAttributeName(), attributeValue);
Expand Down
Loading
Loading