I’m skeptical of IBM’s Real-time Compression Appliances.
More accurately, I’m skeptical of our ability to judge them usefully. The technical summary is mouth-watering: unpack one of these boxes, plug it into your storage network, and a few seconds later the load on your storage and network plummets “by up to 80 percent”. That kind of factor translates in business terms to the potential to skip a couple years of storage purchases, and to slashing power, cooling, and space requirements.
IBM certainly is positioned to deliver a product with such nearly-miraculous capabilities: it has unparalleled experience in the datacenter and a well-endowed stable of research, development, and manufacturing specialists. I want such an appliance.
Part of the purchase price, though, is trust in the vendor. Even if we shake down an appliance like the STN6500 during an evaluation period, the methods and materials are all proprietary: how much does a twenty-day trial with 500 gigabytes say about how it will operate over five years, managing 50 terabytes? Will its performance hold up throughout its lifetime? We can perform due diligence, to investigate, for instance, how the appliance has performed for others, but who knows whether storage of financial transactions compresses more or less easily than video content, engineering measurements, administrative documents, medical records, or any of the dozen other prototypical storage loads that our datacenters might be called on to bear? And even if we were confident of the comparisons five years ago, are they the same across today’s range of virtualized architectures?
My answer: I suspect no one knows. Storage engineering is a hard problem, and I’m convinced that nearly none of the people good enough to understand it well can afford the time to research thoroughly. We’re left lurching around largely in the dark, operating on the basis of folklore and vendor recommendations.
Most of the time, that’s good enough. What if it turns out, though, that IBM’s appliances are only mediocre performers on the traffic specific to a particular datacenter? This isn’t about IBM, particularly; I’ve been a contractor for the company in the past, generally admire its hardware, and dislike several of its management practices. Whatever my personal feelings, and however conscientiously I discount them, storage remains an area where many of the incentives are skewed. Storage evaluation smells faintly like the financial regulation highlighted in the US economic crisis of 2007: most of the people with the knowledge to judge matters accurately also have a stake in promotion of the incumbent organizations. ESG reviewed the Appliances in 2011, and found them “simple to deploy”, high in performance, recoverable, and otherwise up to their specifications. We still have a responsibility to judge how those specifications apply in our own datacenters.
That’s tough, because vendors keep product information proprietary, data organizations generally regard operational information as proprietary, and even the little common knowledge that does diffuse through the devops community is rapidly obsoleted. By the time we learn enough about an appliance to judge it accurately, much of its value as an off-the-shelf solution has eroded.
What role does that leave for appliances, then? Sober commentators like Robin Bloor already have written that appliances are on the way out. I’m not so pessimistic as that. Do be realistic, though. While I’m inclined to agree with IBM’s marketing literature that deduplication and compression can complement each other, document ahead of time what your strategy is for combining these two. Recognize that most installations of compression appliances have been only for network-attached storage (NAS) based on CIFS or NFS. There appears to be little real-world experience to this point with block storage.
Keep clearly in mind that decompression is at least as proprietary as compression: once the data are in storage, a working appliance is required to retrieve them. That suggests that the best storage loads for the appliances are episodic ones, like litigation support, where retention is required only for a relatively short interval.