-
Notifications
You must be signed in to change notification settings - Fork 19
ocagent exporter's bundler stores spans indefinitely in memory #71
Description
What version of the Exporter are you using?
version = "v0.5.0"
What version of OpenCensus are you using?
version = "v0.19.0"
What version of Go are you using?
go1.11.5 darwin/amd64
What did you do?
Consider the below scenario:
An application A is instrumenting its traces using the ocagent-exporter and sending those traces via an oc-collector to a backend.
- as long as there is a valid connection to the collector, the “handler” function, i.e.
uploadTraces, will be called on the bundle to offload the bundled traces to the collector (see https://github.com/googleapis/google-api-go-client/blob/c75846e6b94d2eded794529e4016d3d19ae6eeb1/support/bundler/bundler.go#L114). the handler function is called once the bundle hasBundleCountThresholditems (see https://github.com/googleapis/google-api-go-client/blob/c75846e6b94d2eded794529e4016d3d19ae6eeb1/support/bundler/bundler.go#L57-L61). this is set to 300, i.e. the size of the bundle, inocagent:opencensus-go-exporter-ocagent/ocagent.go
Line 118 in bbad334
traceBundler.BundleCountThreshold = spanDataBufferSize - if there is no active connection to the collector, then we don’t call the handler function i.e.
uploadTracesand the bundle stays in memory:opencensus-go-exporter-ocagent/ocagent.go
Lines 442 to 444 in bbad334
if !ae.connected() { return } - once a bundle is full, a new bundle is created. there can be an infinite number of bundles held in memory (possible if the state is disconnected for a long time).
Hence, if the collector is not reachable for a long enough time, the bundler causes the application's memory to explode, especially if traces are being sampled at a large enough rate. This further leads to the application being down and not being able to serve production traffic.
We have a hunch that this relates to census-instrumentation/opencensus-service#524 as well.
What did you expect to see?
There should be a mechanism by which the unsent bundles should just be dropped if the downstream collector is not able to receive the spans.
What did you see instead?
Huge memory explosion, causing downtime of the application.
Note: Most of the above analysis was done by @elynnyap, I'm just a messenger.