Skip to content

Conversation

dol
Copy link

@dol dol commented Jun 11, 2025

This pull request introduces a new CelBasedSampler to the OpenTelemetry Java SDK, enabling advanced sampling rules using the Common Expression Language (CEL). It also includes updates to the existing RuleBasedRoutingSampler for consistency and clarity. Below is a summary of the most important changes grouped by theme:

New CelBasedSampler Implementation:

  • Added the CelBasedSampler class, which evaluates CEL expressions to make sampling decisions based on span attributes. It includes a fallback sampler and supports multiple expressions. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSampler.java)
  • Introduced the CelBasedSamplerBuilder to construct CelBasedSampler instances with methods for adding expressions and specifying actions (e.g., DROP or RECORD_AND_SAMPLE). (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplerBuilder.java)
  • Added the CelBasedSamplingExpression class to encapsulate individual CEL expressions and their associated samplers. (samplers/src/main/java/io/opentelemetry/contrib/sampler/CelBasedSamplingExpression.java)
  • Implemented a declarative configuration provider for CelBasedSampler, enabling configuration through YAML files. (samplers/src/main/java/io/opentelemetry/contrib/sampler/internal/CelBasedSamplerComponentProvider.java)
# The fallback sampler to use if no expressions match.
fallback_sampler:
  always_on:
# List of CEL expressions to evaluate. Expressions are evaluated in order.
expressions:
  # The action to take when the expression evaluates to true. Must be one of: DROP, RECORD_AND_SAMPLE.
  - action: DROP
    # The CEL expression to evaluate. Must return a boolean.
    expression: attribute['url.path'].startsWith('/actuator')
  - action: RECORD_AND_SAMPLE
    expression: attribute['http.method'] == 'GET' && attribute['http.status_code'] < 400

Documentation Updates:

  • Updated samplers/README.md to document the new CelBasedSampler, including its usage, schema, and example configurations. (samplers/README.md) [1] [2]

Dependency Additions:

  • Added a dependency on the dev.cel:cel:0.9.0 library to enable CEL expression evaluation. (samplers/build.gradle.kts)

Updates to RuleBasedRoutingSampler:

  • Renamed SamplingRule to RuleBasedRoutingSamplingRule for clarity and updated all related references in the RuleBasedRoutingSampler and its builder. (samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java, samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplerBuilder.java, samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSamplingRule.java) [1] [2] [3] [4] [5] [6] [7]

These changes collectively enhance the SDK's sampling capabilities, allowing users to define sophisticated sampling rules using CEL while maintaining compatibility with existing samplers.

@dol dol requested a review from a team June 11, 2025 15:28
Copy link

linux-foundation-easycla bot commented Jun 11, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: dol / name: Dominic Lüchinger (776abf3)

@trask
Copy link
Member

trask commented Jun 11, 2025

@dol this looks nice!

@johnbley what do you think of CEL vs JEXL in the context of open-telemetry/opentelemetry-java-instrumentation#13590?

@dol
Copy link
Author

dol commented Jun 12, 2025

Just a note to myself. I need to expand the test cases with the following addition.

  • Define multiple expressions to test the handling and ordering of expressions
    • Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect.
    • Add more complex expressions that pass the compile step and should work in the verify step like attribute['http.status_code'] < 400 and time based example to sample outside of business hours

@johnbley
Copy link
Member

@dol this looks nice!

@johnbley what do you think of CEL vs JEXL in the context of open-telemetry/opentelemetry-java-instrumentation#13590?

Very interesting! I'll take a look at it!

@dol
Copy link
Author

dol commented Jun 12, 2025

Just a note to myself. I need to expand the test cases with the following addition.

* Define multiple expressions to test the handling and ordering of expressions
  
  * Add the same expression with DROP and RECORD_AND_SAMPLE. The first expression and action should be take affect.
  * Add more complex expressions that pass the compile step and should work in the verify step like `attribute['http.status_code'] < 400` and time based example to sample outside of business hours

As I refined the tests I found some edge cases in the setup and mapping of the CEL engine. I'll mark the PR as work in progress. Sorry for that.

@dol dol changed the title Common Expression Language (CEL) sampler [WIP] Common Expression Language (CEL) sampler Jun 12, 2025
@trask
Copy link
Member

trask commented Jun 12, 2025

I'll mark the PR as work in progress. Sorry for that.

no worries! you can click "Convert to draft" (hidden under the list of reviewers), and then when you're ready you can click "Ready for review"

@dol dol marked this pull request as draft June 12, 2025 15:51
@dol
Copy link
Author

dol commented Jun 18, 2025

@trask @jack-berg : I wanted you to give an update on the PR. As I was adding more tests cases to PR I encountered a problem with single/double quote parsing on the declarative config. First I though it's an issue with a complex expression input. But after digging deep into the source code of this project and opentelementry-java I figured out, that the https://opentelemetry.io/docs/specs/otel/configuration/data-model/#environment-variable-substitution implementation on top of snakeyaml can not handle a mix of single and double quotes very well.

I created the following bug report: open-telemetry/opentelemetry-java#7429

As the chances are very high that a complex expression will need to mix single and double quotes, this bug should be addressed first.

@dol
Copy link
Author

dol commented Jun 19, 2025

open-telemetry/opentelemetry-java#7433 should fix this regression.

@dol dol force-pushed the feature/cel-sampler branch from c21379b to a6e408c Compare June 23, 2025 22:09
@dol
Copy link
Author

dol commented Jun 23, 2025

The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review.
The tests still require the open-telemetry/opentelemetry-java#7433 to be approved, merged and released.
Until then the build and tests will fail.

@dol dol changed the title [WIP] Common Expression Language (CEL) sampler Common Expression Language (CEL) sampler Jun 25, 2025
@dol dol marked this pull request as ready for review June 25, 2025 05:48
@breedx-splk
Copy link
Contributor

Build is pretty broken. Can you look at this @dol thanks!

@dol
Copy link
Author

dol commented Jul 7, 2025

The latest version added more unit tests and better coverage. I think the new CEL sampler is ready for an review. The tests still require the open-telemetry/opentelemetry-java#7433 to be approved, merged and released. Until then the build and tests will fail.

@breedx-splk I'm waiting for the v1.52.0 release, which should be released soon. This will fix all the broken tests.

Adds new CelBasedSampler to the OpenTelemetry Java SDK, enabling advanced
sampling rules using the Common Expression Language (CEL).
It also includes updates to the existing RuleBasedRoutingSampler for
consistency and clarity.
@dol dol force-pushed the feature/cel-sampler branch from 154ab66 to 776abf3 Compare July 21, 2025 20:08
@dol
Copy link
Author

dol commented Aug 6, 2025

@jack-berg @trask The PR is ready for review after the fix was merged and release ( open-telemetry/opentelemetry-java#7433 ) and the BOM version was bumped ( 850933f#diff-df7d51fc1db73056c56958a9784e26310dae8ec239fb940820f5ddea4b655693L5 )

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is an awesome addition that users will start trying immediately. I left a few comments/questions that I'd like to see addressed...but this is looking solid.

Comment on lines +77 to +78
fallback_sampler:
always_on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this required or does it default to always_on if not present? Might be nice to mention that here?

Comment on lines +16 to +18
final CelAbstractSyntaxTree abstractSyntaxTree;
final String expression;
final Sampler delegate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe our convention is to make these private and use explicit getters, or use @AutoValue with a create() method.

Comment on lines +41 to +52
DeclarativeConfigProperties fallbackModel = config.getStructured("fallback_sampler");
if (fallbackModel == null) {
throw new DeclarativeConfigException(
"cel_based sampler .fallback_sampler is required but is null");
}
Sampler fallbackSampler;
try {
fallbackSampler = DeclarativeConfiguration.createSampler(fallbackModel);
} catch (DeclarativeConfigException e) {
throw new DeclarativeConfigException(
"cel_based sampler failed to create .fallback_sampler sampler", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factoring this out to a method called getFallbackModel() could probably help to shrink this long method and improve readability. 👍🏻

# The fallback sampler to use if no expressions match.
fallback_sampler:
always_on:
# List of CEL expressions to evaluate. Expressions are evaluated in order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to be very explicit. Calling out that it's in-order is great, but is it that the first expression to match is applied and the remaining do not get evaluated, or is it the last one to match "wins"?

DeclarativeConfiguration.parseAndCreate(
new ByteArrayInputStream(yaml.getBytes(StandardCharsets.UTF_8)));
Sampler sampler = openTelemetrySdk.getSdkTracerProvider().getSampler();
assertThat(sampler.toString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for this first pass (shouldn't gate this PR), but testing through the toString() is a bit of an antipattern. We can look at ways of making that more testable later tho.

try {
builder.recordAndSample(expression);
} catch (CelValidationException e) {
// Delegate to the provider to handle the exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about fail/catch.

+ " - action: DROP\n"
+ " expression: 'invalid cel expression!'\n",
"Failed to compile CEL expression: invalid cel expression!"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but it would be cool to add coverage for the case where expressions exists but is empty.

CelBasedSampler.celCompiler
.compile(
"spanKind == 'SERVER' && attribute[\""
+ URL_FULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works because AttributeKey.toString() merely returns the key (String), but if the toString() one day had type information in it or something, this could fail. It can be made slightly more explicit/robust by using URL_FULL.getKey().

CelBasedSampler.celCompiler
.compile(
"spanKind == 'SERVER' && attribute[\""
+ URL_FULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

@ExtendWith(MockitoExtension.class)
class CelBasedSamplerTest {
private static final String SPAN_NAME = "MySpanName";
private static final SpanKind SPAN_KIND = SpanKind.SERVER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of the tests at least, this additional redirection of an existing constant makes the tests slightly harder to read/understand. I'd just inline SpanKind.SERVER instead of having this indirection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants