AWS X-Ray Adaptive Sampling Support #1141
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
AWS X-Ray Adaptive Sampling Support
Description
AWS X-Ray will soon be supporting adaptive sampling through the sampling rule APIs, allowing customers to configure anomaly rate based sampling rate boosts. GetSamplingTargets and GetSamplingRules will be updated to support new inputs and outputs relevant to this feature, and as such the SDK must be updated.
Sampling boost
The SDK now supports sending "boost statistics" to the GetSamplingTargets API. These statistics include the number of total requests (traces), the number of traces with anomalies detected (more on this later), and the number of anomalies sampled. The server then responds with instructions on what sampling rate to set and for how long. The SDK adjusts accordingly
Local configuration
Customer's also need to be able to define what an anomaly is for their applications to effectively provide boost statistics. By default, any 5XX error (or fallback to ERROR attribute) is treated as an anomaly. If provided, a local configuration can define specific criteria including error code regex, operation, and high latency threshold to count statistics based on.
Anomaly Capture (disabled by default)
Anomalies can also be captured directly when left unsampled. When anomalous spans are detected, a reservoir-style span capturing mechanism configured through the above local configuration will send the span directly to the spanExporter. These will appear in the console as partial traces and ensure the customer can see spans even if the boosted sampling rate was unable to capture the anomalies.
Changes
AWS X-Ray component patch update for OTel Java Contrib - see the following diff between the changes here and the release of the contrib we currently consume: link (includes diff from previous patch on the sampler)
AwsSamplingResult
that includes the matched sampling rule in the trace state or propagates the received sampling rule from an upstream call using the trace stateAwsXrayAdaptiveSamplingConfig
for the local SDK configuration optionAwsXrayRemoteSampler
to allow identification and export of anomaliesadaptSampling
function that is called on each span and acts if and only if adaptive sampling configurations are present - this is where the core logic of the feature isSamplingRuleApplier
:XrayRulesSampler
XrayRulesSampler
:AwsXrayAdaptiveSamplingConfig
and apply it inadaptSampling
to change anomaly capturing/boost logicshouldSample
usingAwsSamplingResult
adaptSampling
anomalyTracesSet
that holds trace IDs for anomaly spans to ensure we don't double count anomalies in one trace. When the local root span for this trace is encountered, it is removed from the setgenerateIngressOperation
based on ADOT SDK logic for getting operation - used for matching with operations provided in local configurationADOT SDK Changes
customizeSampler
to provide the sampler the span exporter and the local adaptive sampling configuration and pass the sampler to the span metrics processoradaptSampling
on each span from the span metrics processorTesting
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.