Outage happens

Chase Bank went off-line. Could application performance management (APM) have saved it?

That question is far thornier than the simple words that go into it. None of us on the outside know enough of the facts to be sure; from my own off-the-record conversations with financial and security specialists, I suspect that no insider has a full picture that includes all the actors.

The main website of JPMorgan Chase Bank has had several apparent outages in recent years, including a rather high-profile one at mid-month. This most recent one was interesting because Chase acknowledged it during the incident. This is an innovation; during incidents in the past, even Chase support technicians answering live telephone calls have been relatively tight-lipped, so for Chase to publish acknowledgment that there’s a problem is quite a step toward transparency.

There’s a long way to go, of course. While there’s far, far more intrigue than one brief posting can capture–Chase has been linked at various times to infiltrators from Anonymous, Hamas, China, Iran, Saudi Arabia, Israel, and more–I have less interest in speculation than the parts of the engineering that appear more certain to me.

In this perspective, “outage” doesn’t align with “terrorism” or “espionage”; outage is a routine aspect of operation that arises from a variety of root causes, among which “cyberwar” and terrorism are almost certainly exaggerated. “[B]anks are likely to face more software glitches in 2013“, as IT Ops noted at the beginning of the month, but mundane programming stumbles are at least as large a problem as targeted attacks.

One of the biggest lessons I draw from study of such operations and incidents is that organizations make many of their own choices. Chase is an aggressively profit-oriented bank which cultivates narratives of fear and machismo. Its “Go Paperless” on-line advertisement hyperlinks only to an enrollment form; the specification of how “Paperless statements are secure and green” is minimal and misleading (“… [online] access to up to seven years of statements …” does not mean “seven years”).

Engineering polish is surely secondary for a highly “transactional” company that targets mass markets. If Chase executives think about website outages at all, they surely classify them as a cost of doing business, somewhere between taxes and nuisance litigation in magnitude and gravity.

That’s not the only way to run a bank, of course. Simple presents itself as a radical contrast, a bank which cultivates the trust of savvy consumers with an emphasis on engineering, transparency, and low costs.

The point for today is not at all to praise one model and condemn the other; it’s to illustrate how much diversity an apparently straightforward description such as “online bank security” cloaks. Both Chase and Simple are banks with important online operations. Because they have different cultures and goals, though, their digital operations diverge considerably. Their systems and fault modes surely reflect those different patterns.

The first step in clear thinking about APM, then, is to understand the organization’s true priorities. Choices for a company where end-user experience (EUE) is paramount should be different from those that process consumers like other commodities. When EUE is important–when delays in website response truly are intolerable–then APM is a necessary part of the toolkit a smart engineering team deploys to keep processes humming smoothly. We don’t yet have certainty about the magnitude of cybercrime and cyberespionage risks, and likely never will. It’s a reasonable bet, though, that good engineering can protect a well-managed operation, and APM is part of that good engineering.

Leave a Reply

Your email address will not be published. Required fields are marked *