perf: limit goroutine usage while unmarshaling prometheus and influx points/values #909
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a simple goroutine limit for the prometheus/influx backends "unmarshal time series" codepaths, using errgroup to limit goroutine creation to the current GOMAXPROCS value.
Not having a limit seems like a bad idea for these particular goroutine creations. I debated on removing these goroutines entirely, but settled with mapping this to the GOMAXPROCS value (Exposing via config will require many more LOC updates). If we want to remove concurrency here, I can pull that out of my
git stash.The rest of our goroutine creation seems to be more naturally tied to e.g. the number of series being processed and easier to predict (compared to goroutines per each value/point).
There's still a chance for somewhat-unbounded goroutines to be created -- given trickster is still accepting requests, or depending on the number of timeseries a result may contain
If we wanted to improve in that area, we'd need more drastic changes, some sort of generic worker pool concept. On the simpler side, perhaps we could recommend doing so indirectly via rate limiting + http.Server options?