As an operator, I need guidance on how to do aggregated monitoring and logging across multiple clusters across multiple regions. The latency of this monitoring must be close to real-time (less than 5 second delay).
Walmart: 1 service will output ~60K events per second. Total logs from all systems will be ~1.1million/second