Software and Monitoring Tool to Blame in Two Blackouts
There were two major blackouts yesterday around the US. One affected Tampa, FL and the other affected several states in the Southwest and Mexico. According to reports, these outages were caused by a software glitch and a monitoring tool.
Let’s start with the blackout in Tampa, Florida, which according to reports left more than 1 million people without power for more than 5 hours during the day yesterday (and some were still without as evening approached).
According to a report on the Suncoast News web site, the outage was caused by a software glitch and redundant backup systems failed as the network became overwhelmed. If this has every happened at your company, it’s a helpless feeling and the engineers worked feverishly to get it back online.
The other blackout left more than 6 million people without power in the Southwest including residents San Diego, many of whom still didn’t have power over night. The blackout left people living in the desert in triple digit scorching heat without power for air conditioning.
According to a report on CBSNews.com, “Power officials said the massive blackout was likely caused by an employee removing a piece of monitoring equipment at a power substation in southwest Arizona.”
So a piece of a equipment meant to help network officials monitor network performance caused a major outage when it was taken offline — talk about unintended consequences.
From a monitoring standpoint, these two outages present a compelling case about the impact monitoring can have on an organization and the effect outages–in this case involving electricity delivery–can have on end users.
If you picture your end users sitting in the dark without the heat or air conditioning when your network goes down, perhaps you can grasp the enormity of the situation for them.
If the whole service goes down, which happened to Google Docs for a short time earlier this week, it can have a big impact on users who come to depend on your service to get their work done.
And that’s what you have to keep in mind as you go about your monitoring tasks at whatever type of business you work at. It might not be quite as critical as electricity delivery (especially when it was reported that a couple of nuclear power plants lost power in the San Diego area), but it doesn’t mean you can’t picture your users idle all the same.
If nothing else, these blackouts serve to reinforce just how critical it is to keep your systems up and running, and when a disaster strikes, to get it back up as quickly as you can.