You probably think you have a challenge when it comes to monitoring your systems, and you probably do, but chances are your tasks are minuscule when compared to what Facebook’s IT pros have to deal with. Today, I learned about some tools Facebook’s developers created in-house to keep up with its gigantic monitoring task.
For Facebook, and for many companies faced with the blessing and curse of so much data, Facebook could usually tell when something was amiss, but they couldn’t tell why. It’s a data problem that many large companies face — even if it isn’t quite on the scale Facebook has to deal with.
In a blog/note (whatever they call it on Facebook), Facebook engineer Lior Abraham explained how the Facebook team relies on real-time instrumentation to monitor how well Facebook is performing over time. The trouble is they accumulate so much data across so many different dimensions that their traditional data analysis tools couldn’t keep up with their need for information.
As you build larger data sets, it gives you the ability to develop more sophisticated queries and get answers to questions you might not have even considered as you mix and match data in new ways, but you need your tools to come up with the answers to those questions quickly.
When you’re Facebook, in fact, any slow down is unacceptable and when you’re dealing with literally hundreds of millions of people hitting your site every hour of every day and sharing tons of new data, you need to know if you’re site is performing up to par.
And if it’s not, you need to be able to figure out why, and for Facebook, even more so than your average large data center, it could be anything — from someone turning on a test to a new feature somewhere to a data center problem in one of the many countries where Facebook operates.
When faced with this issue, Facebook did what many companies are doing when it comes to processing and understanding big data sets. In the best hacking and open source style, Facebook engineers created a tool they call Scuba, by themselves on the fly. One engineer created the back end, another the front. An intern improved the speed and so forth, and it has become an invaluable tool for Facebook to track what’s happening on its systems and why.
Unfortunately, there is no indication that Facebook plans to open source this tool (as Netflix did with their suite of monitoring tools), but it shows the potential of big data sets, the creativity of engineers when faced with a monitoring problem and the power of need to drive innovation, proving after all these years that necessity is still the mother of invention.
You might not have the data issues that a company like Facebook does, but surely you can learn from what they did to solve their issues — and perhaps apply that same ingenuity to help solve the monitoring issues going in your organization.