Strange days indeed.
~John Lennon, Nobody Told Me
It was a strange day indeed for the Internet yesterday. If it seemed like your connection was slow or broken, it probably wasn’t your imagination. That’s because according to a post on CNN, a router glitch caused sites to go down across the Internet on Monday.
Indeed, Networkworld was reporting several major networks including Comcast and Time Warner, and poor, pitiful RIM experienced issues. I know personally, while my Comcast connection didn’t break completely, I noticed things slowed down considerably and some sites including LinkedIn and Facebook weren’t working correctly throughout the day.
According to a Computerworld article, problems started Monday morning in the eastern time zone of the US and apparently were the result of a Juniper Networks firmware upgrade that caused the routers to reset and all hell to break loose on sites across the Internet. It seems that many large providers across the Internet backbone use these routers.
Juniper meanwhile released a statement saying it was aware of the problem, and that it affected a small percentage of customers. While it’s probably not possible to say with absolute certainty that this was the root of the issue on Monday, if it was, then calling a problem that seemed to affect the entire freaking Internet a small percentage of customers would have to be the public relations understatement of the year.
As the Networkworld post pointed out, regardless of the company at fault, the slow-down didn’t stop people from pulling out their mobile phones and complaining bitterly on social networks, proving once again that when things go wrong on public networks, it can be a major embarrassment for network administrators far beyond simply trying to resolve the issues at hand.
If you were going nuts yesterday trying to figure out what was wrong with your application or web site, you might not have realized immediately that it was the victim of a piece of hardware on the Internet backbone. And that shows that among the dizzying number of factors that can affect your web site or application, is the Internet infrastructure itself.
When that spins out of control, until the news starts to leak out, you may be stuck in a major crisis mode trying to track down why your site or application is slow or not working. And when you’re in the middle of a major issue, chances are you’re not monitoring the Internet for the latest news, but it may not be a bad idea to have someone on the team doing just that.
While it may make sense to have all hands on deck trying to resolve the issue, if as was the case yesterday, the problem is outside of your company’s capacity to fix the problem, if you had someone checking social networks and news sources, you would have soon realized that.
It’s not too often you hear about a problem affecting the world wide Internet backbone, but apparently this problem did. Just another crazy day in the life of a monitoring pro.