E-commerce is sexy. All those dollars spinning across the Web – who doesn’t want that? Contingency planning, however, is as far from sexy as it gets.
Unfortunately, the first doesn’t exist long without the second.
Look at eBay and Bell Canada. One’s an e-comm success story, the other is a significant on-line infrastructure supplier. Both have had damaging and embarrassing outages lately, failures which could have been prevented.
Bell’s network suffered a major meltdown recently when a fire was sparked in the central office exchange site in Toronto. The blaze took out a power supply which fed the central switches, and Bell workers were not permitted to fire up the back-up generator because automatic sprinklers had soaked the facility, making the site dangerous.
Some voice and data services were out for more than five hours, trading on the TSE dropped almost $1 billion compared to the previous day, access slowed to the nuclear-war proof Internet, traffic lights at more than 500 intersections were affected, and 911 service limped along on reduced capacity.
This city-wide calamity was caused – and this is almost funny – by a worker who dropped a tool in the electrical room, setting off a one-in-a-million domino-run of events.
Next, consider eBay. Servers at the successful on-line auction site have been elevator-like in their recent ups and downs. Calculations by research firm Illuminata Inc. point to more than 62 hours of downtime over a six-month period. Over five days in June the company posted more than 30 messages on its announcement board, seeking to explain its outages and reassure users. And the longest single failure, which stretched almost 24 hours, cost an estimated US$3.9 million in lost revenue.
And the problems continue. The site went down in the morning of Aug. 6 and stayed dark for at least eight hours. The company pointed to a “network anomaly” as the cause, and has also previously blamed Sun’s Solaris, saying it corrupted the company’s Oracle database.
A Sun official, however, suggested eBay could have avoided all its problems if it had installed a long-available Solaris patch. Also, had eBay employed a simple second-server failover system, the public would probably never have known the company had any technical problems.
In light of eBay’s revenue figures, the cost of those precautions is negligible. The same can be said of Bell Canada. One near-site back-up generator would have saved the day.
So why weren’t these companies better prepared? Quite likely, IT staff at both organizations are stretched to capacity, budgets are probably tight, they are probably already late on other projects, etc. All the usual reasons.
But just as a sports team win games by endlessly drilling the fundamentals, so an IT department should think first of basics: is the power supply reliable, could we add back-up network connectivity, have we installed software patches, is data backed-up, and are the back-up tapes stored in a safe location?
Few companies have the luxury of unlimited budgets, but some precautions are relatively cheap. And it’s irresponsible not to explore those options.