|
| 1 | +# Compressing spans |
| 2 | + |
| 3 | +To mitigate the potential flood of spans to a backend, |
| 4 | +agents SHOULD implement the strategies laid out in this section to avoid sending almost identical and very similar spans. |
| 5 | + |
| 6 | +While compressing multiple similar spans into a single composite span can't fully eliminate the collection overhead, |
| 7 | +it can significantly reduce the impact on the following areas, |
| 8 | +with very little loss of information: |
| 9 | +- Agent reporter queue utilization |
| 10 | +- Capturing stack traces, serialization, compression, and sending events to APM Server |
| 11 | +- Potential to re-use span objects, significantly reducing allocations |
| 12 | +- Downstream effects like reducing impact on APM Server, ES storage, and UI performance |
| 13 | + |
| 14 | +### Configuration option `span_compression_enabled` |
| 15 | + |
| 16 | +Setting this option to true will enable span compression feature. |
| 17 | +Span compression reduces the collection, processing, and storage overhead, and removes clutter from the UI. |
| 18 | +The tradeoff is that some information such as DB statements of all the compressed spans will not be collected. |
| 19 | + |
| 20 | +| | | |
| 21 | +|----------------|----------| |
| 22 | +| Type | `boolean`| |
| 23 | +| Default | `false` | |
| 24 | +| Dynamic | `true` | |
| 25 | + |
| 26 | + |
| 27 | +## Consecutive-Exact-Match compression strategy |
| 28 | + |
| 29 | +One of the biggest sources of excessive data collection are n+1 type queries and repetitive requests to a cache server. |
| 30 | +This strategy detects consecutive spans that hold the same information (except for the duration) |
| 31 | +and creates a single [composite span](#composite-span). |
| 32 | + |
| 33 | +``` |
| 34 | +[ ] |
| 35 | +GET /users |
| 36 | + [] [] [] [] [] [] [] [] [] [] |
| 37 | + 10x SELECT FROM users |
| 38 | +``` |
| 39 | + |
| 40 | +Two spans are considered to be an exact match if they are of the [same kind](#consecutive-same-kind-compression-strategy) and if their span names are equal: |
| 41 | +- `type` |
| 42 | +- `subtype` |
| 43 | +- `destination.service.resource` |
| 44 | +- `name` |
| 45 | + |
| 46 | +### Configuration option `span_compression_exact_match_max_duration` |
| 47 | + |
| 48 | +Consecutive spans that are exact match and that are under this threshold will be compressed into a single composite span. |
| 49 | +This option does not apply to [composite spans](#composite-span). |
| 50 | +This reduces the collection, processing, and storage overhead, and removes clutter from the UI. |
| 51 | +The tradeoff is that the DB statements of all the compressed spans will not be collected. |
| 52 | + |
| 53 | +| | | |
| 54 | +|----------------|----------| |
| 55 | +| Type | `duration`| |
| 56 | +| Default | `5ms` | |
| 57 | +| Dynamic | `true` | |
| 58 | + |
| 59 | +## Consecutive-Same-Kind compression strategy |
| 60 | + |
| 61 | +Another pattern that often occurs is a high amount of alternating queries to the same backend. |
| 62 | +Especially if the individual spans are quite fast, recording every single query is likely to not be worth the overhead. |
| 63 | + |
| 64 | +``` |
| 65 | +[ ] |
| 66 | +GET /users |
| 67 | + [] [] [] [] [] [] [] [] [] [] |
| 68 | + 10x Calls to mysql |
| 69 | +``` |
| 70 | + |
| 71 | +Two spans are considered to be of the same type if the following properties are equal: |
| 72 | +- `type` |
| 73 | +- `subtype` |
| 74 | +- `destination.service.resource` |
| 75 | + |
| 76 | +```java |
| 77 | +boolean isSameKind(Span other) { |
| 78 | + return type == other.type |
| 79 | + && subtype == other.subtype |
| 80 | + && destination.service.resource == other.destination.service.resource |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +When applying this compression strategy, the `span.name` is set to `Calls to $span.destination.service.resource`. |
| 85 | +The rest of the context, such as the `db.statement` will be determined by the first compressed span, which is turned into a composite span. |
| 86 | + |
| 87 | +### Configuration option `span_compression_same_kind_max_duration` |
| 88 | + |
| 89 | +Consecutive spans to the same destination that are under this threshold will be compressed into a single composite span. |
| 90 | +This option does not apply to [composite spans](#composite-span). |
| 91 | +This reduces the collection, processing, and storage overhead, and removes clutter from the UI. |
| 92 | +The tradeoff is that the DB statements of all the compressed spans will not be collected. |
| 93 | + |
| 94 | +| | | |
| 95 | +|----------------|----------| |
| 96 | +| Type | `duration`| |
| 97 | +| Default | `5ms` | |
| 98 | +| Dynamic | `true` | |
| 99 | + |
| 100 | +## Composite span |
| 101 | + |
| 102 | +Compressed spans don't have a physical span document. |
| 103 | +Instead, multiple compressed spans are represented by a composite span. |
| 104 | + |
| 105 | +### Data model |
| 106 | + |
| 107 | +The `timestamp` and `duration` have slightly similar semantics, |
| 108 | +and they define properties under the `composite` context. |
| 109 | + |
| 110 | +- `timestamp`: The start timestamp of the first span. |
| 111 | +- `duration`: gross duration (i.e., _<last compressed span's end timestamp>_ - _<first compressed span's start timestamp>_). |
| 112 | +- `composite` |
| 113 | + - `count`: The number of compressed spans this composite span represents. |
| 114 | + The minimum count is 2 as a composite span represents at least two spans. |
| 115 | + - `sum.us`: sum of durations of all compressed spans this composite span represents in microseconds. |
| 116 | + Thus `sum.us` is the net duration of all the compressed spans while `duration` is the gross duration (including "whitespace" between the spans). |
| 117 | + - `compression_strategy`: A string value indicating which compression strategy was used. The valid values are: |
| 118 | + - `exact_match` - [Consecutive-Exact-Match compression strategy](tracing-spans-compress.md#consecutive-exact-match-compression-strategy) |
| 119 | + - `same_kind` - [Consecutive-Same-Kind compression strategy](tracing-spans-compress.md#consecutive-same-kind-compression-strategy) |
| 120 | + |
| 121 | +### Effects on metric processing |
| 122 | + |
| 123 | +As laid out in the [span destination spec](tracing-spans-destination.md#contextdestinationserviceresource), |
| 124 | +APM Server tracks span destination metrics. |
| 125 | +To avoid compressed spans to skew latency metrics and cause throughput metrics to be under-counted, |
| 126 | +APM Server will take `composite.count` into account when tracking span destination metrics. |
| 127 | + |
| 128 | +## Compression algorithm |
| 129 | + |
| 130 | +### Eligibility for compression |
| 131 | + |
| 132 | +A span is eligible for compression if all the following conditions are met |
| 133 | +1. It's an [exit span](tracing-spans.md#exit-spans) |
| 134 | +2. The trace context of this span has not been propagated to a downstream service |
| 135 | +3. If the span has `outcome` (i.e., `outcome` is present and it's not `null`) then it should be `success`. |
| 136 | + It means spans with outcome indicating an issue of potential interest should not be compressed. |
| 137 | + |
| 138 | +The second condition is important so that we don't remove (compress) a span that may be the parent of a downstream service. |
| 139 | +This would orphan the sub-graph started by the downstream service and cause it to not appear in the waterfall view. |
| 140 | + |
| 141 | +```java |
| 142 | +boolean isCompressionEligible() { |
| 143 | + return exit && !context.hasPropagated && (outcome == null || outcome == "success") |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +### Span buffering |
| 148 | + |
| 149 | +Non-compression-eligible spans may be reported immediately after they have ended. |
| 150 | +When a compression-eligible span ends, it does not immediately get reported. |
| 151 | +Instead, the span is buffered within its parent. |
| 152 | +A span/transaction can buffer at most one child span. |
| 153 | + |
| 154 | +Span buffering allows to "look back" one span when determining whether a given span should be compressed. |
| 155 | + |
| 156 | +A buffered span gets reported when |
| 157 | +1. its parent ends |
| 158 | +2. a non-compressible sibling ends |
| 159 | + |
| 160 | +```java |
| 161 | +void onEnd() { |
| 162 | + if (buffered != null) { |
| 163 | + report(buffered) |
| 164 | + } |
| 165 | +} |
| 166 | + |
| 167 | +void onChildEnd(Span child) { |
| 168 | + if (!child.isCompressionEligible()) { |
| 169 | + if (buffered != null) { |
| 170 | + report(buffered) |
| 171 | + buffered = null |
| 172 | + } |
| 173 | + report(child) |
| 174 | + return |
| 175 | + } |
| 176 | + |
| 177 | + if (buffered == null) { |
| 178 | + buffered = child |
| 179 | + return |
| 180 | + } |
| 181 | + |
| 182 | + if (!buffered.tryToCompress(child)) { |
| 183 | + report(buffered) |
| 184 | + buffered = child |
| 185 | + } |
| 186 | +} |
| 187 | +``` |
| 188 | + |
| 189 | +### Turning compressed spans into a composite span |
| 190 | + |
| 191 | +Spans have `tryToCompress` method that is called on a span buffered by its parent. |
| 192 | +On the first call the span checks if it can be compressed with the given sibling and it selects the best compression strategy. |
| 193 | +Note that the compression strategy selected only once based on the first two spans of the sequence. |
| 194 | +The compression strategy cannot be changed by the rest the spans in the sequence. |
| 195 | +So when the current sibling span cannot be added to the ongoing sequence under the selected compression strategy |
| 196 | +then the ongoing is terminated, it is sent out as a composite span and the current sibling span is buffered. |
| 197 | + |
| 198 | +If the spans are of the same kind, and have the same name and both spans `duration` <= `span_compression_exact_match_max_duration`, |
| 199 | +we apply the [Consecutive-Exact-Match compression strategy](tracing-spans-compress.md#consecutive-exact-match-compression-strategy). |
| 200 | +Note that if the spans are _exact match_ |
| 201 | +but duration threshold requirement is not satisfied we just stop compression sequence. |
| 202 | +In particular it means that the implementation should not proceed to try _same kind_ strategy. |
| 203 | +Otherwise user would have to lower both `span_compression_exact_match_max_duration` and `span_compression_same_kind_max_duration` |
| 204 | +to prevent longer _exact match_ spans from being compressed. |
| 205 | + |
| 206 | +If the spans are of the same kind but have different span names and both spans `duration` <= `span_compression_same_kind_max_duration`, |
| 207 | +we compress them using the [Consecutive-Same-Kind compression strategy](tracing-spans-compress.md#consecutive-same-kind-compression-strategy). |
| 208 | + |
| 209 | +```java |
| 210 | +bool tryToCompress(Span sibling) { |
| 211 | + isAlreadyComposite = composite != null |
| 212 | + canBeCompressed = isAlreadyComposite ? tryToCompressComposite(sibling) : tryToCompressRegular(sibling) |
| 213 | + if (!canBeCompressed) { |
| 214 | + return false |
| 215 | + } |
| 216 | + |
| 217 | + if (!isAlreadyComposite) { |
| 218 | + composite.count = 1 |
| 219 | + composite.sumUs = duration |
| 220 | + } |
| 221 | + |
| 222 | + ++composite.count |
| 223 | + composite.sumUs += other.duration |
| 224 | + return true |
| 225 | +} |
| 226 | + |
| 227 | +bool tryToCompressRegular(Span sibling) { |
| 228 | + if (!isSameKind(sibling)) { |
| 229 | + return false |
| 230 | + } |
| 231 | + |
| 232 | + if (name == sibling.name) { |
| 233 | + if (duration <= span_compression_exact_match_max_duration && sibling.duration <= span_compression_exact_match_max_duration) { |
| 234 | + composite.compressionStrategy = "exact_match" |
| 235 | + return true |
| 236 | + } |
| 237 | + return false |
| 238 | + } |
| 239 | + |
| 240 | + if (duration <= span_compression_same_kind_max_duration && sibling.duration <= span_compression_same_kind_max_duration) { |
| 241 | + composite.compressionStrategy = "same_kind" |
| 242 | + name = "Calls to " + destination.service.resource |
| 243 | + return true |
| 244 | + } |
| 245 | + |
| 246 | + return false |
| 247 | +} |
| 248 | + |
| 249 | +bool tryToCompressComposite(Span sibling) { |
| 250 | + switch (composite.compressionStrategy) { |
| 251 | + case "exact_match": |
| 252 | + return isSameKind(sibling) && name == sibling.name && sibling.duration <= span_compression_exact_match_max_duration |
| 253 | + |
| 254 | + case "same_kind": |
| 255 | + return isSameKind(sibling) && sibling.duration <= span_compression_same_kind_max_duration |
| 256 | + } |
| 257 | +} |
| 258 | +``` |
| 259 | + |
| 260 | +### Concurrency |
| 261 | + |
| 262 | +The pseudo-code in this spec is intentionally not written in a thread-safe manner to make it more concise. |
| 263 | +Also, thread safety is highly platform/runtime dependent, and some don't support parallelism or concurrency. |
| 264 | + |
| 265 | +However, if there can be a situation where multiple spans may end concurrently, agents MUST guard against race conditions. |
| 266 | +To do that, agents should prefer [lock-free algorithms](https://en.wikipedia.org/wiki/Non-blocking_algorithm) |
| 267 | +paired with retry loops over blocking algorithms that use mutexes or locks. |
| 268 | + |
| 269 | +In particular, operations that work with the buffer require special attention: |
| 270 | +- Setting a span into the buffer must be handled atomically. |
| 271 | +- Retrieving a span from the buffer must be handled atomically. |
| 272 | + Retrieving includes atomically getting and clearing the buffer. |
| 273 | + This makes sure that only one thread can compare span properties and call mutating methods, such as `compress` at a time. |
0 commit comments