Skip to content

Conversation

@majanjua-amzn
Copy link
Contributor

@majanjua-amzn majanjua-amzn commented Aug 12, 2025

AWS X-Ray Adaptive Sampling Support

Description

AWS X-Ray will soon be supporting adaptive sampling through the sampling rule APIs, allowing customers to configure anomaly rate based sampling rate boosts. GetSamplingTargets and GetSamplingRules will be updated to support new inputs and outputs relevant to this feature, and as such the SDK must be updated.

Sampling boost

The SDK now supports sending "boost statistics" to the GetSamplingTargets API. These statistics include the number of total requests (traces), the number of traces with anomalies detected (more on this later), and the number of anomalies sampled. The server then responds with instructions on what sampling rate to set and for how long. The SDK adjusts accordingly

Local configuration

Customer's also need to be able to define what an anomaly is for their applications to effectively provide boost statistics. By default, any 5XX error (or fallback to ERROR attribute) is treated as an anomaly. If provided, a local configuration can define specific criteria including error code regex, operation, and high latency threshold to count statistics based on.

Anomaly Capture (disabled by default)

Anomalies can also be captured directly when left unsampled. When anomalous spans are detected, a reservoir-style span capturing mechanism configured through the above local configuration will send the span directly to the spanExporter. These will appear in the console as partial traces and ensure the customer can see spans even if the boosted sampling rate was unable to capture the anomalies.

Changes

AWS X-Ray component patch update for OTel Java Contrib - see the following diff between the changes here and the release of the contrib we currently consume: link (includes diff from previous patch on the sampler)

  • Added an import for OTel semantic conventions
  • Created a class called AwsSamplingResult that includes the matched sampling rule in the trace state or propagates the received sampling rule from an upstream call using the trace state
  • Added a class called AwsXrayAdaptiveSamplingConfig for the local SDK configuration option
  • Added the config object and a batch span exporter to the AwsXrayRemoteSampler to allow identification and export of anomalies
  • Added adaptSampling function that is called on each span and acts if and only if adaptive sampling configurations are present - this is where the core logic of the feature is
  • Updated calls to GetSamplingTargets to include boost statistics
  • Updated GetSamplingRules and GetSamplingTargets request and response classes where relevant
  • In SamplingRuleApplier:
    • Able to receive boost information
    • Able to receive boost related statistics from the XrayRulesSampler
    • Fixed bug where sampling rules are scheduled to call GetSamplingTargets at slightly different times
  • In XrayRulesSampler:
    • Added attribute to spans when boost config/anomaly capturing is enabled to be able to identify boost-enabled systems in X-Ray backend
    • Accept AwsXrayAdaptiveSamplingConfig and apply it in adaptSampling to change anomaly capturing/boost logic
    • Get and propagate upstream sampling rule in shouldSample using AwsSamplingResult
    • Core implementation of adaptSampling
      • If account has no boost, return
      • [1] Gets rule to report to based on upstream matched rule propagated through trace state, [2] identifies anomalies based on local config or default 5XX, [3] captures anomaly if error capture is enabled, [4] counts boost statistics
      • Maintain anomalyTracesSet that holds trace IDs for anomaly spans to ensure we don't double count anomalies in one trace. When the local root span for this trace is encountered, it is removed from the set
    • Add generateIngressOperation based on ADOT SDK logic for getting operation - used for matching with operations provided in local configuration
  • Add unit tests

ADOT SDK Changes

  • Add YAML import for reading local configuration
  • Remove B3 and multi propagators as they remove/override the sampling rule propagated through trace state and are no longer needed
  • Update customizeSampler to provide the sampler the span exporter and the local adaptive sampling configuration and pass the sampler to the span metrics processor
  • Call adaptSampling on each span from the span metrics processor
  • Add parsing function and associated test

Testing

  • Automated release test passing and going through PR here: aws-application-signals-test-framework#442
  • Manual testing done using 3 services, A -> B -> C, where A has a boosted sampling rule and B and C produce anomalies that are sent to the backend, boosting the sampling rate in A
  • Performance testing using opentelemetry-java-instrumentation/benchmark-overhead test framework revealed little to no difference in performance, including average/max CPU usage %, min/max heap usage, and average/p95 latency.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@majanjua-amzn majanjua-amzn self-assigned this Aug 12, 2025
@majanjua-amzn majanjua-amzn added enhancement New feature or request X-Ray AWS X-Ray components traces Tracing related issues java Pull requests that update Java code labels Aug 12, 2025
@majanjua-amzn majanjua-amzn force-pushed the adaptive-sampling branch 3 times, most recently from 2ae32ca to 7a3deb0 Compare August 13, 2025 16:26
@majanjua-amzn majanjua-amzn force-pushed the adaptive-sampling branch 3 times, most recently from cff1b13 to c7d90aa Compare August 20, 2025 20:40
@majanjua-amzn majanjua-amzn changed the base branch from main to release/v2.11.x August 21, 2025 20:24
@majanjua-amzn majanjua-amzn marked this pull request as ready for review August 21, 2025 22:32
@majanjua-amzn majanjua-amzn requested a review from a team as a code owner August 21, 2025 22:32
wangzlei
wangzlei previously approved these changes Aug 21, 2025
@majanjua-amzn majanjua-amzn merged commit ed7e1c8 into release/v2.11.x Aug 22, 2025
5 of 8 checks passed
@majanjua-amzn majanjua-amzn deleted the adaptive-sampling branch August 22, 2025 19:41
lukeina2z pushed a commit to lukeina2z/aws-otel-java-instrumentation that referenced this pull request Sep 5, 2025
Updating patch for `v2.10.0` to `v2.11.0` bump.

Reference for how this patch was created:
yiyuan-he/opentelemetry-java-instrumentation#1

```
The following dependencies are using the latest release version:
 - com.sparkjava:spark-core:2.9.4
 - com.squareup.okhttp3:okhttp:4.12.0
 - io.opentelemetry:opentelemetry-extension-aws:1.20.1

The following dependencies have later release versions:
 - com.amazonaws:aws-java-sdk-bom [1.12.599 -> 1.12.783]
     https://aws.amazon.com/sdkforjava
 - com.fasterxml.jackson:jackson-bom [2.16.0 -> 2.19.0]
     https://github.com/FasterXML/jackson-bom
 - com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin [0.50.0 -> 0.52.0]
 - com.google.guava:guava-bom [33.0.0-jre -> 33.4.8-jre]
     https://github.com/google/guava
 - com.google.protobuf:protobuf-bom [3.25.1 -> 4.31.0]
     https://developers.google.com/protocol-buffers/
 - com.linecorp.armeria:armeria-bom [1.26.4 -> 1.32.5]
     https://armeria.dev/
 - commons-logging:commons-logging [1.2 -> 1.3.5]
     https://commons.apache.org/proper/commons-logging/
 - io.grpc:grpc-bom [1.59.1 -> 1.72.0]
     https://github.com/grpc/grpc-java
 - io.opentelemetry.contrib:opentelemetry-aws-resources [1.39.0-alpha -> 1.46.0-alpha]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.contrib:opentelemetry-aws-xray [1.39.0 -> 1.46.0]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha [2.11.0-adot1-alpha -> 2.16.0-alpha]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.javaagent:opentelemetry-javaagent [2.11.0-adot1 -> 2.16.0]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.proto:opentelemetry-proto [1.0.0-alpha -> 1.7.0-alpha]
     https://github.com/open-telemetry/opentelemetry-proto-java
 - net.bytebuddy:byte-buddy [1.14.10 -> 1.17.5]
     https://bytebuddy.net
 - org.apache.logging.log4j:log4j-bom [2.21.1 -> 2.24.3]
     https://logging.apache.org/log4j/2.x/
 - org.assertj:assertj-core [3.24.2 -> 3.27.3]
     https://assertj.github.io/doc/#assertj-core
 - org.curioswitch.curiostack:protobuf-jackson [2.2.0 -> 2.7.0]
     https://github.com/curioswitch/protobuf-jackson
 - org.junit:junit-bom [5.10.1 -> 5.12.2]
     https://junit.org/junit5/
 - org.slf4j:slf4j-api [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.slf4j:slf4j-simple [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.springframework.boot:spring-boot-dependencies [2.7.17 -> 3.5.0]
     https://spring.io/projects/spring-boot
 - org.testcontainers:testcontainers-bom [1.19.3 -> 1.21.0]
     https://java.testcontainers.org
 - software.amazon.awssdk:bom [2.21.33 -> 2.31.49]
     https://aws.amazon.com/sdkforjava

Gradle release-candidate updates:
 - Gradle: [8.10 -> 8.14.1]

```

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: ADOT Patch workflow <[email protected]>

Empty commit to trigger main build (aws-observability#1084)

Blank commit to trigger Java Agent Main Build with the latest commit
from our test framework repo.

```
git commit --allow-empty -m "Empty commit to trigger main build"
```

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

Release/v2.11.1 (aws-observability#1094)

*Description of changes:*

Merges changes from mainline to v2.11.1

Namely:
aws-observability#1085
and
aws-observability#1089

```
The following dependencies are using the latest release version:
 - com.sparkjava:spark-core:2.9.4
 - com.squareup.okhttp3:okhttp:4.12.0
 - io.opentelemetry:opentelemetry-extension-aws:1.20.1

The following dependencies have later release versions:
 - com.amazonaws:aws-java-sdk-bom [1.12.599 -> 1.12.785]
     https://aws.amazon.com/sdkforjava
 - com.fasterxml.jackson:jackson-bom [2.16.0 -> 2.19.0]
     https://github.com/FasterXML/jackson-bom
 - com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin [0.50.0 -> 0.52.0]
 - com.google.guava:guava-bom [33.0.0-jre -> 33.4.8-jre]
     https://github.com/google/guava
 - com.google.protobuf:protobuf-bom [3.25.1 -> 4.31.1]
     https://developers.google.com/protocol-buffers/
 - com.linecorp.armeria:armeria-bom [1.26.4 -> 1.32.5]
     https://armeria.dev/
 - commons-logging:commons-logging [1.2 -> 1.3.5]
     https://commons.apache.org/proper/commons-logging/
 - io.grpc:grpc-bom [1.59.1 -> 1.73.0]
     https://github.com/grpc/grpc-java
 - io.opentelemetry.contrib:opentelemetry-aws-resources [1.39.0-alpha -> 1.46.0-alpha]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.contrib:opentelemetry-aws-xray [1.39.0-adot1 -> 1.46.0]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha [2.11.0-adot2-alpha -> 2.16.0-alpha]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.javaagent:opentelemetry-javaagent [2.11.0-adot2 -> 2.16.0]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.proto:opentelemetry-proto [1.0.0-alpha -> 1.7.0-alpha]
     https://github.com/open-telemetry/opentelemetry-proto-java
 - net.bytebuddy:byte-buddy [1.14.10 -> 1.17.5]
     https://bytebuddy.net
 - org.apache.logging.log4j:log4j-bom [2.21.1 -> 2.24.3]
     https://logging.apache.org/log4j/2.x/
 - org.assertj:assertj-core [3.24.2 -> 3.27.3]
     https://assertj.github.io/doc/#assertj-core
 - org.curioswitch.curiostack:protobuf-jackson [2.2.0 -> 2.7.0]
     https://github.com/curioswitch/protobuf-jackson
 - org.junit:junit-bom [5.10.1 -> 5.13.0]
     https://junit.org/junit5/
 - org.slf4j:slf4j-api [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.slf4j:slf4j-simple [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.springframework.boot:spring-boot-dependencies [2.7.17 -> 3.5.0]
     https://spring.io/projects/spring-boot
 - org.testcontainers:testcontainers-bom [1.19.3 -> 1.21.1]
     https://java.testcontainers.org
 - software.amazon.awssdk:bom [2.21.33 -> 2.31.56]
     https://aws.amazon.com/sdkforjava

Gradle release-candidate updates:
 - Gradle: [8.10 -> 8.14.1]

```

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Jonathan Lee <[email protected]>
Co-authored-by: Thomas Pierce <[email protected]>
Co-authored-by: Michael He <[email protected]>
Co-authored-by: ADOT Patch workflow <[email protected]>
Co-authored-by: Prashant Srivastava <[email protected]>
Co-authored-by: Mohamed Asaker <[email protected]>

Update rust version (aws-observability#1097)

*Description of changes:*

Release build failed with:
<img width="1200" alt="image"
src="https://github.com/user-attachments/assets/3df092be-b9f3-4e62-9652-32cf4823d0ef"
/>

Updating rust version required for `edition2024`

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

[Lambda Java v2.11.x] Merge All Code Changes from v1.33.x Branch into v2.11.x (aws-observability#1114)

This change merges all private Lambda Java updates from the v1.33 branch
into the v2.11.x branch. I performed a 'git rebase v2.11' on the v1.33
branch, reviewed all changes, and completed the build and testing
process. The resulting Lambda layer generated trace data identical to
the version built directly from the v2.11.x branch (excluding this PR).

Here is the list of all migrated PRs:

Build layer during CI/CD workflows + some minor refactoring aws-observability#989

support java11 runtime for lambda aws-observability#1001

Unique artifact names for upload and merge for download aws-observability#1014

Bug fixes] Lambda - duplicate lambda spans + appsignals from unsampled
spans aws-observability#1000

Fix: Lambda Topology Issue (aws-observability#1016)

Fix: Lambda Topology Issue (aws-observability#1016) aws-observability#1085

feat: Support microservice span in Lambda Java environment. aws-observability#1053

Test
Tested Java11, 17, and 21 Lambda functions. Manually tested PR-1000 and
PR-1053. Both work as expected in the v2.11 branch. MicroService
(SpringBoot) support works well. I verified attribute
Trace.lambda.multiple server can be found in the Lambda server span,
once we have Servlet instrumentation enabled with
OTEL_INSTRUMENTATION_SERVLET_ENABLED.

Note: The changes in the patch files are not included in this PR. They
should have been reviewed and incorporated as part of this migration:

Upgrade Java Lambda Layer to 2.x aws-observability#1076

Lambda with SpringBoot MicroService:
<img width="1367" alt="lambda"
src="https://github.com/user-attachments/assets/5cf5be29-4986-454c-b61b-773d6cde3848"
/>

Service Map and added microservice attribute 'Trace.lambda.multiple
server'.
<img width="1864" alt="traceMap"
src="https://github.com/user-attachments/assets/f7ff1771-61f0-4013-b571-90370a726aa9"
/>

AppSignals
<img width="1875" alt="appSignals"
src="https://github.com/user-attachments/assets/24f1b3a8-851c-4c97-bb50-087ee275b86d"
/>

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

Release/v2.11.2 (aws-observability#1131)

*Description of changes:*
Cherry-picked commits from the mainline to my branch release/v2.11.2

438802b Send main build metrics (aws-observability#1127)
f9b24f2 [AppSignal E2E Testing] Validate E2E Tests Are Accounted For
(aws-observability#1126)
d672f84 Fix Otlp Aws exporters failures for GZIP compressed telemetry
exports (aws-observability#1124)
0be84b6 AWS SDK v1.11 Patch Migration (aws-observability#1117)
2c3ef71 AWS SDK v2.2 SPI Patch Migration (aws-observability#1113)
dac0fd8 Sigv4 - Add Missing STS Dependency (aws-observability#1101)
ce91366 fix compatibility issue with java v8 (aws-observability#1118)
691c970 Base of AWS SDK v1.11 SPI Implementation (aws-observability#1115 )
f425675 Base of AWS SDK v2.2 SPI Implementation (aws-observability#1111)
a41c7f3 feat: Extract account/access key id and region for cross-account
support (aws-observability#1081)
e524eda update local operation of lambda span based on span attribute
(aws-observability#1106)
43198cf Add lambda layer default region (aws-observability#1104)
fe2ec3a Add YYC, BKK, KUL, QRO, ZHY, BJS to the lambda layer release
workflow (aws-observability#1103)
542b209 Update Sonatype publishing URL to Central Portal (aws-observability#1090)
31e4de1 Release safety (aws-observability#1096)
e45a0ab Update image scan to point to 2.11.1 release (aws-observability#1099)

Also bumped the adot2 to adot3 as we are doing all of this under Java
SDK 2.11.2 patch release.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Jeel Mehta <[email protected]>
Co-authored-by: Steve Liu <[email protected]>
Co-authored-by: Prashant Srivastava <[email protected]>
Co-authored-by: Harry <[email protected]>
Co-authored-by: Ping Xiang <[email protected]>
Co-authored-by: Blair Huang <[email protected]>
Co-authored-by: Anahat <[email protected]>
Co-authored-by: Thomas Pierce <[email protected]>
Co-authored-by: Jonathan Lee <[email protected]>
Co-authored-by: Eric Zhang <[email protected]>

Revert "Release/v2.11.2 (aws-observability#1131)"

This reverts commit 4f5704f.

Release/v2.11.2 v2 (aws-observability#1133)

Description of changes:
Cherry-picked commits from the mainline to my branch release/v2.11.2

Release/v2.11.3 (aws-observability#1146)

*Description of changes:*

Merges changes from mainline to v2.11.3
Namely: aws-observability#1111 aws-observability#1115 aws-observability#1113 aws-observability#1117 and aws-observability#1120

Steps followed:

1. Fork `aws-otel-java-instrumentation` repo
2. Checkout `release/2.11.x`
3. Create branch `release/2.11.3` based off `release/2.11.x` (`git
checkout -b release/2.11.3`)
4. `git cherry-pick 572215e ac3c0c7 9a76dda 8a3b772 25b2cd8`
5. Resolved merge conflict for 25b2cd8
6. run `./gradlew dependencyUpdates`
7. Create PR

```
The following dependencies are using the latest release version:
 - com.sparkjava:spark-core:2.9.4
 - io.opentelemetry:opentelemetry-extension-aws:1.20.1

The following dependencies have later release versions:
 - com.amazonaws:aws-java-sdk-bom [1.12.599 -> 1.12.788]
     https://aws.amazon.com/sdkforjava
 - com.fasterxml.jackson:jackson-bom [2.16.0 -> 2.19.2]
     https://github.com/FasterXML/jackson-bom
 - com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin [0.50.0 -> 0.52.0]
 - com.google.guava:guava-bom [33.0.0-jre -> 33.4.8-jre]
     https://github.com/google/guava
 - com.google.protobuf:protobuf-bom [3.25.1 -> 4.31.1]
     https://developers.google.com/protocol-buffers/
 - com.linecorp.armeria:armeria-bom [1.26.4 -> 1.33.1]
     https://armeria.dev/
 - com.squareup.okhttp3:okhttp [4.12.0 -> 5.1.0]
     https://square.github.io/okhttp/
 - commons-logging:commons-logging [1.2 -> 1.3.5]
     https://commons.apache.org/proper/commons-logging/
 - io.grpc:grpc-bom [1.59.1 -> 1.74.0]
     https://github.com/grpc/grpc-java
 - io.opentelemetry.contrib:opentelemetry-aws-resources [1.39.0-alpha -> 1.48.0-alpha]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.contrib:opentelemetry-aws-xray [1.39.0-adot1 -> 1.48.0]
     https://github.com/open-telemetry/opentelemetry-java-contrib
 - io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha [2.11.0-alpha -> 2.18.1-alpha]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.javaagent:opentelemetry-javaagent [2.11.0 -> 2.18.1]
     https://github.com/open-telemetry/opentelemetry-java-instrumentation
 - io.opentelemetry.proto:opentelemetry-proto [1.0.0-alpha -> 1.7.0-alpha]
     https://github.com/open-telemetry/opentelemetry-proto-java
 - net.bytebuddy:byte-buddy [1.14.10 -> 1.17.6]
     https://bytebuddy.net
 - org.apache.logging.log4j:log4j-bom [2.21.1 -> 2.25.1]
     https://logging.apache.org/log4j/2.x/
 - org.assertj:assertj-core [3.24.2 -> 3.27.4]
     https://assertj.github.io/doc/#assertj-core
 - org.curioswitch.curiostack:protobuf-jackson [2.2.0 -> 2.8.1]
     https://github.com/curioswitch/protobuf-jackson
 - org.junit:junit-bom [5.10.1 -> 5.13.4]
     https://junit.org/
 - org.slf4j:slf4j-api [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.slf4j:slf4j-simple [1.7.36 -> 2.0.17]
     http://www.slf4j.org
 - org.springframework.boot:spring-boot-dependencies [2.7.17 -> 3.5.4]
     https://spring.io/projects/spring-boot
 - org.testcontainers:testcontainers-bom [1.19.3 -> 1.21.3]
     https://java.testcontainers.org
 - software.amazon.awssdk:bom [2.30.17 -> 2.32.22]
     https://aws.amazon.com/sdkforjava

Gradle release-candidate updates:
 - Gradle: [8.10 -> 9.0.0 -> 9.1.0-rc-1]
```

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Thomas Pierce <[email protected]>
Co-authored-by: Steve Liu <[email protected]>

AWS X-Ray Adaptive Sampling Support (aws-observability#1141)

Propagate sampling decision as attribute (aws-observability#1161)

Shorten trace state usage for adaptive sampling (aws-observability#1164)

[Adaptive Sampling] Improve trace capturing and counting using cache

Fix disk-buffering build failure in contrib (aws-observability#1169)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java Pull requests that update Java code traces Tracing related issues X-Ray AWS X-Ray components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants