Chances are, if you have thousands of servers, you are running some sort of hyperscale environment. But is your monitoring hyperscale-friendly?
In the beginning you might well have had 10 servers all running business-critical applications. You dutifully monitored everything on the server. Well, you dutifully monitored after you have had too many issues with no monitoring at all.
Then, over time, each new outage brought you a new set of checks and before you knew it, your boxes became extremely monitored. They have a multitude of ways to set the pager off. Your monitoring strategy continues like this as your server farm grows. Before you know it, you have hundreds, maybe even thousands, of machines with very basic false positive monitoring. Continue reading