Unsure about existing performance of Rare Terms Aggregation at the moment, but looking through initial code at high level, it looks like that this aggregation also utilizes iterating through each document.
The idea is to utilize the terms frequency from Lucene similar to #11643 and avoid iterating through individual documents.
Next Steps:
- Measure/gather existing performance of rare terms aggregation
- Improve upon the implementation if it can be done with above ideation