The fire alarms in the “WAR” (work area recovery) room of secure managed services provider QuoVadis Ltd. in Hamilton, Bermuda went off accidentally during a crisis that destroyed the country’s sole electricity supplier.
With strobe lights flashing and horns blaring, it appeared as though the country had just declared war.
Fire detection at the bunker that houses the backup servers and computers is a serious event. You only have 40 seconds to get out of the room before the fire suppression system goes off, according to Walter Cooke, chief security officer at QuoVadis.
“Supposedly you can survive being inside the room when the systems go off, but [in reality] the fire retardant might freeze your eyeballs if you are standing anywhere near the sprays.”
Cooke barely had enough time to grab a wide-eyed client and throw him out of the room.
Fortunately it was a false alarm.
“This was not the calm and orderly process one wants to follow in an emergency,” Cooke said in hindsight addressing attendees at the 16th World Conference on Disaster Management held in Toronto recently.
He said QuoVadis learned several key lessons when a fire struck Bermuda Electric Light Company Ltd. (BELCO) in July 14, 2005 and knocked out Bermuda’s lone electricity provider for days; but one that stood out was to “keep your cool” when disaster strikes. “That goes for your presence of mind as well as your computer equipment.”
Although the hosting firm was able to restore client operations within hours of getting its own system up and running, they nearly lost it all when back-up generators and cooling systems overloaded and threatened to blow up.
The 24-square mile mid-Atlantic island of Bermuda is just 700 miles off North Carolina, and is one of the world’s largest centres for reinsurance and offshore financial management services. Unfortunately it also lies on a hurricane path.
Having survived the 2003 storm Fabian that packed 225 kmph winds and tornadoes, QuoVadis mistakenly thought their infrastructure could weather anything. The firm did not expect a disaster that would leave buildings intact, but render most communications and data resources inoperable. After Hurricane Fabian, new contracts for IT hosting and WAR sites began popping up in the island followed by local computer security and Internet Service Providers.
Initially QuoVadis provided secure hosting in its SecureCentre bunker with redundant environmental and network systems. With an increasing number of customers, the company opened a mid-sized WAR room with 140 seats in a secure area near their Hamilton office. One area that worked very effectively for QuoVadis was the deployment of a disaster recovery (DR) systems, raw server power and disk space allocation for their clients. The company uses virtualization software from VMware of Palo Alto, Calif. to optimize use of its physical servers and maximize processing and disk capacity.
“Rather than each client having their own physical server hardware, we provide one giant pot of CPU and disk resources,” said Cooke. VMware offers a virtualized computing milieu where a master supervisor program allocates the real resources available to a large number of virtual environments. This allows QuoVadis to run between 60 and 90 virtual computer servers in an eight-blade computer environment with a six terabyte attached SAN disk array.
“Starting up a customer’s DR server simply involves booting their virtual piece of the large disk array, restoring their latest data from a backup tape and then bringing their service online.”
Technology was not the problem; it was the human element that became a challenge, according to Cooke. “The harder part of our job was assuring clients their business was secure and would continue to be so.”
Cooke said clients were shortsighted in certain key areas. For instance, they had not prepared for a disaster that leave facilities intact but knock out most of the equipment in them.
With the power out in the entire island, QuoVadis relied on its diesel generators to keep their air conditioners running. The generators were in an underground parking lot. The separate room containing the uninterrupted power supply (UPS) for the bunker was located 100 yards from the generator and the AC air exchange unit was outside the UPS room.
With the generator pumping out massive amounts of heat and the cooling system in close proximity, the AC system eventually buckled under. The UPS’s temperature shot above 120 degree celsius causing the bank of batteries to individually go offline or fail.
This left little or no “cushion” for smoothing out generator power fluctuations, or the ability to switchover back to city power when full power was restored.
Giant fans were brought into service to blow hot air out of the car park and small portable AC units were rigged to augment the struggling cooling units.
“We had to sit in the 120 degree UPS room, 24/7 nursing the ACs and replacing UPS batteries as they died and resetting circuit breakers as they failed,” said Cooke.
Looking back Cooke said their generators didn’t work when desperately needed because they had not been properly serviced and regularly tested. The generators and UPS systems were overloaded because the demands on them were not planned for and “incorrect assumptions had been made about power capacity.
“You always have to back-up your systems.”
Patrick Dempster, principal of PBD Consulting in Atlantic Highlands, New Jersey, agrees with Cooke’s recommendation.
“You have to back-up power supply and operation capabilities to what you need to get back in operation mode during a disaster,” said Dempster. Dempster, whose company assisted several U.S. government agencies including the Internal Revenue Services (IRS) during the Typhoon Katrina disaster said a “safe assumption would be to provision for 80 per cent capacity of operation.” He also advises that firms set up a “hot site” which could translate to QuoVadis’s WAR room for clients.
“You have to have an office to run operations from during a disaster, ideally [located] 50 miles from your original office so it’s close to get to but hopefully far enough to avoid whatever might hit your first building.” Another key challenge that cropped up in Bermuda was the absence of communication facilities.
Cell towers failed, office PBX phones couldn’t be switched over, and fibre network switches didn’t work.
“Always keep communication lines open with multiple back-up systems in place, be prepared to have human messengers around to carry messages between sites,” Dempster said.
But preparing for the worst apart from the needed capital outlay is largely as state of mind according to Gerry Cummings, regional director for the Middle East of the risk consultancy firm SecureRisks Group PLC, based in Essex, U.K.
Cummings who advises several multinational firms in Dubai said it could be a challenge to convince some clients to invest in risk mitigating infrastructure and systems. “The multinational firms realize the need, but some domestic businesses still choose to keep their eyes close to the possible risks,” he said.
Cooke said in some cases his DR plans would not work for clients who are not as prepared as his company is. His job is to “gently point them in the right direction.”
“You have to plan for peace but build for war.”