SHARE
Follow this article on Twitter Facebook LinkedIn Bookmark and Share
Home >> Integrating IT >> Project Management

Seven strategies for keeping disaster recovery on target

Seven strategies for keeping disaster recovery on target

By:  Craig Sands and Andrew Truscott  On: 30 Apr 2008 For: CIO Canada Creator

Too often, business-continuity efforts are aimed at the failure of a single IT processing site or component, overlooking the multitude of smaller incidents that also pose a risk. To help keep you armed for all eventualities, here are seven cost-effective approaches to smart IT disaster recovery, as outlined by Craig Sands and Andrew Truscott of Accenture Canada.

It was a normal Monday batch process at a well-respected global bank – until, that is, a critical back-office system failed. At first, IT administrators took it in stride. This wasn’t the only time they’d had to recover lost data. But soon it became clear something more ominous was occurring: the bank’s multi-terabyte database had become corrupted.

The administrators tried to switch to the hot offsite backup. No luck: it had mirrored the corruption. In the IT world, the situation was beginning to spell ‘crisis’. Applications teams and anyone else who could help had to suspend all priorities to focus on the failure. Despite best efforts, the target recovery time – four hours – came and went without a clue as to the problem’s root cause or fix.

It began to look like an episode of ‘House’, with IT managers anxiously brainstorming for more than a day, trying to diagnose the mysterious disorder in their dying patient. They knew a premature move could make matters worse.

To the outside world, the bank showed no sign of its grave condition. Customers continued trading, unaware that this high-profile institution was on the verge of losing millions, being investigated by regulators, and spoiling its good reputation.

Out of view from customers, the IT teams struggled to keep the patient alive. They scrambled to find a clean backup. They found out the corruption had happened two days before the crash; it would take 36 hours to run a check on earlier copies of the data to see if it was clean. They worked on updating the production system, rerunning transaction log files to catch up to the crash point, and processing days of transactions that had since accumulated. Senior managers burned the midnight oil to decide which processes to give priority. By end of day Friday, the bank was uncertain it could open for business on Monday. It might be too risky to go more than five days without accurate settlement reconciliation. The bank alerted regulators. The team plugged away on catch-up processing over the weekend. Fortunately, they completed it in time. By Monday the patient was out of danger and the bank was able to open its doors.

Security Threats & Solutions Webinar

To view this Webinar, hosted by John Pickett, go to the IT World Canada home page and click on Disaster Recovery in the Security section of our Knowledge Centre.A MATTER OF WHEN, NOT IF

This bank is not alone. Indeed, similar near misses are increasingly common. One global retailer had its point-of-sale transactions freeze for 18 hours during the holiday shopping season. The cause: a storage-network software bug that was never precisely identified. Despite the happy ending at the global bank, its senior managers and IT teams were left troubled. Losses had been modest but had the failure struck at year-end instead – when trading was running at full tilt as investors tidied their portfolios – the outcome could have been disastrous.


Sign up for our Newsletters












Print |  Views: 1550   |   Rating:offoffoffoffoff  (0 votes)
Rate this article on a scale of
1 to 5 stars,5 being the best.




Craig Sands and Andrew Truscott Craig Sands and Andrew Truscott is a contributor to the International Data Group (IDG) News Service, which publishes global technology stories from bureaus around the world to more than 300 publications in more than 60 countries.

Related Content

IT managers call one third of apps mission-critical
IT managers call one third of apps mission-criticalA survey by Symantec indicates that enterprise technology professionals aren't spending enough time testing their disaster recovery plans, and those that are aren't covering off all the possible threats
Powered up for disaster planning
Powered up for disaster planningThe surprise of the SARS outbreak taught many organizations that their disaster recovery and business continuity plans were insufficient. Toronto, Ont.-based electricity provider, Hydro One, was no exception, said the company's Dave Baumken, manager of emergency preparedness and business continuity planning. "We didn't have anything in place for high volumes of staff absenteeism," said Baumken.
Bracing for the blight –Toronto body calls for city-wide disaster recovery exercise
Bracing for the blight –Toronto body calls for city-wide disaster recovery exerciseSome three years after North America's largest power failure left more than 50 million people in the U.S. and Canada in the dark, a Toronto organization is asking that a coordinated city-wide disaster recovery exercise be launched.
blog comments powered by Disqus