Bad Stuff Can Still Happen to Prepared Companies

It happens to the best of companies. You might have all your monitoring bases covered and all your carefully crafted disaster plans in place, but sometimes no matter what you do, outages happen.

If you doubt that, look at the world’s biggest cloud companies, the ones whose livelihood depends on being up and you’ll see they have some very well-publicized outages. Unlike say Amazon, Google or Microsoft, yours might not be so public, but it doesn’t mean the people affected are any less frustrated.

The difference is your users probably aren’t on Twitter complaining and the big cloud company’s customers probably are. The fact is, in spite of of all the plans and schemes that Amazon and Google have in place, sometimes some extraordinary stuff happens and the site goes down.

For instance, in August both Microsoft and Amazon had cloud outages when the power source near their plants in Ireland was struck by lightening resulting in an outage of several hours. Natural disasters happen from time to time folks. Your users and your bosses have to understand there are such things as “circumstances beyond your control.”

Last spring, Amazon had a pretty significant outage that caused the system go down for a couple of days and take out some pretty popular services running on it with it. There were also reports of data loss. It wasn’t pretty, and you would think Amazon of all companies would be right on top of that, wouldn’t you?

You really would — and they probably are most of the time when we are not hearing anything about them but the latest enhancement, but every once in a while, as happened that week in April, the situation spun out of their control and it got pretty darn uncomfortable.

You can’t know when or why your next outage is going to happen. You just have to know that no matter how diligent you are, outages happen. They just do. Sometimes it’s careless human error or something that was simply overlooked, but often as not it’s just something fluky like weather or the system behaving in a way that’s not expected as happened recently in the Southwest black out during which a fall-over system simply stopped working the way it was supposed to, resulting in a rolling blackout across the Southwest.

I’m not suggesting you give into the outages or that you make excuses when you get them — Gee, boss I read that even Amazon has big outages — um, no, that’s really not what I’m suggesting.

But you should be aware that sometimes no matter what you do or how hard you try to prevent it, bad stuff happens and systems go down for a whole host of reasons. You won’t be measured by that yardstick so much as how you react to the crisis and what you do to prevent it from happening again — because everyone knows, stuff happens.

Leave a Reply

Your email address will not be published. Required fields are marked *