Preparing for the worst

So close your eyes for just a moment and try to imagine this scenario. It’s a perfectly wonderful, sunny Monday morning as you arrive to your office a whole ten minutes early. A cup of java in hand, you stroll over to your PC and turn it on. Only there is no response, and try as you might in a feeble attempt to find a heartbeat, it’s to no avail. Now, as the caffeine has begun to work its magic, the issue rises from your misty haze: Do we have a disaster recovery plan and how long am I going to be without access to my data?

Not long ago, discussions in IT focused on uptime and the mystical five nines. However, recent events in the U.S. are leading organizations to refocus on recovery and what steps IT departments can make to ensure systems are back up and running within a reasonable amount of time, should something go wrong.

“We back up on a disaster recovery basis, with that off site and in our day-to-day iterative scenario by offloading it to a remote site on a daily basis,” said Michael Beirne, president at Total Uptime in Toronto. He said the company, which provides disaster recovery solutions, recommends remote backup via VPN. But, since the limitations of data lines are a factor, the company also provides tape solutions. And when planning for a recovery, Total Uptime tells its customers to plan for the worst.

“We’ve planned a protocol (for) a building burning down. A typical disaster recovery will require a combination of our disaster recovery copy plus the most up-to-date read out which we have in our collocation facility are restored to a server in our location and then served up to the staff on a new workstation.” The cost for the software and the services range from approximately $10,000 for a small- to medium-size business to $35,000 for a high-end solution.

But in Canada, medium-sized businesses are ignoring the importance of a contingency plan, according to some.

“When you get down to the medium-size companies in Canada, no I don’t think it’s taken seriously enough,” said Rob Colraine, director of infrastructure deployment at IDC Canada in Toronto. Those companies that were able to recover quickly from the events of Sept. 11 had good disaster recovery routines in place. Of the larger vendors, he said IBM offers one of the most comprehensive services and equipment available in the market.

At Tivoli, senior systems engineer Hugo Garcia said the company is providing assistance management software that helps in managing networks, systems and applications for a variety of operating systems. Its storage management solution monitors systems and alerts users of a potential problem that could cause a slowdown or a complete server failure. He said in systems management, customers should be aware that even the most expensive tools won’t prevent a disaster, but should concern themselves with the proper implementation of their products.

For a large organization with mission-critical applications, Garcia said a replicated off-site location is the ideal approach because it becomes a mirror image of the infrastructure. “So if their main site goes down because of a disaster, they can switch their operations over to the off-site location, recover the data and continue operating until the site is back online.”

For Markham, Ont.-based DataMirror, as its name implies, its High Availability Suite mirrors databases and anything in the operating environment. “What we allow you to do is (we) actually mirror everything from one system to a redundant system so now you have a hot backup site. If the first system goes down, users are switched over to the recovery node and the downtime can range anywhere from several minutes to half an hour,” said Brian Butler, product marketing manager at DataMirror.

Butler said while some companies opt for tape as backup, he called it redundant. “(Ours) is on the database level so its capturing all the changes and flows them between the two systems…You won’t have to worry about data loss. If there is corruption on one server as its going down, it’s not going to be replicated across the other server and corrupt any of your information.” An entry-level price ranges from $60,000 to $80,000 for the company’s product.

Halcyon also provides data recovery solutions. It’s application-monitoring suite essentially lives inside Sun Microsystems Inc.’s Management Center to provide uptime in a data centre. In the event of a disaster, the program alerts customers. Albert Lee, the head of sales and marketing for the Toronto-based Halcyon, said customers are notified nearly instantly if for example, their database was not being replicated. “It would forewarn you that some action was going to be required and (to) make sure that your backup site was ready to go,” he said.

Lee said while downtime is being reported less often, on a component-by-component basis, improvements are still needed.

“You may be able to report that your router is being managed better or your storage but what is not being improved in a great way is the availability of the whole enterprise as a unit.” The old model of systems management focused on ensuring all of the point-by-point components were operating. Now the industry as a whole is moving toward analyzing their entire application and business processes, he said.

Beirne said what is now considered acceptable downtime is changing, whereas previously clients would concede hours to weeks as appropriate, the window is getting smaller. Businesses are aware that extended downtime can be hard financially and can also permanently tarnish a company’s reputation.

The private sector however, is only a part of the discussion. The government-run Workers Compensation Board (WCB) in Edmonton recently examined its disaster recovery practices. Previously, the WCB believed it could live without its computer systems for up to six weeks and would be able to cope through its manual process, said the WCB’s Deborah Harrop. However, with the events of Y2K, the WCB determined there were at least three critical systems that would need to be running again in three days in the event of a major disaster, she said.

“After that, the cost of trying to recover gets to be horrific. You start getting huge stock piled transactions and the cost of re-entering those transactions is very high. And then there’s also a degradation in the service we provide to our clients.”

After some deliberation, the WCB decided to work with a hot site vendor. A hot site refers to a separate location geographically removed from where the original data is. It becomes the backup site for NT servers and mainframes. The WCB chose IBM, and ran its first disaster recovery exercise back in May and it lasted three days. They recovered data from their critical mainframe applications and all of their databases.

“And we tested connectivity from the IBM hot site in the Toronto area back to Edmonton where we had some of our staff working.” Harrop said often times organizations that have stellar data recovery plans forget that testing is crucial. “A disaster recovery plan that has never been tested is not very valuable – it will help but it sure won’t get you through a disaster.”

Typically, a system can fail because of a power or communications failure, but the events of Sept. 11 will lead to the scrutinizing of disaster practices in Canada and around the globe.

“The events of [Sept. 11] have repercussions. Certainly within the Canadian data and customer base we’re seeing increased visibility because of this and companies reviewing their disaster recovery plans,” said Dick Bird, non stop business manager at Compaq in Richmond Hill, Ont. While the company does provide assistance to set up a recovery plan, and encourage organizations to run disaster testing annually, Bird said it’s the customer’s who must initiate such programs.

But as Colraine suggested, whether or not an organization opts to implement a disaster recovery strategy is determined by how critical it considers its data. Some companies consider tape backup and off site storage to be sufficient. It generally depends on who needs access to the data. “It’s when they realize how important the data is not just to themselves but for other people as well. Companies that have purely internal access don’t think about it as much,” Colraine said.

John Webster agreed that some organizations believe more firmly in the value of their IT departments than others, and consequently approach disaster recovery accordingly. The senior analyst at Illuminata in Nashua, N.H., argued that companies will break down the potential losses on a per day, hour or even minute scenario to determine how long they can survive without the data or critical systems. By using this type of formula or approach, it will then help the organization define the type of solution to put in place.

Webster said the Internet has been a catalyst in establishing the importance of data to a company and that organizations will invariably look at what they have in place. “Most organizations will take a second look at their disaster recovery capabilities and their potential vulnerabilities. It will force everybody to at least wonder about their vulnerability.”

Beirne said Total Uptime is constantly encouraging companies to implement a solid disaster recovery plan, and more organizations are indeed spending the resources needed to ensure the data can be recovered. The data has become more valuable as the bulk of the information moves to the end user’s computer.

Harrop said working with a disaster recovery vendor is invaluable. “This sort of a relationship with a disaster recovery vendor is one of the best ways to ensure that recoverability is there, that your people have documented the process, that it works and that they’re familiar with their role at the same time.”

Production, please create a sidebar:


-determine how much of the data you keep is critical

-how long can your site stay down

-do you have the staff in-house to implement a plan

-where is the data stored and is outsourcing a viable alternative

-are you testing what you have on an annual basis

-is the replication system dispersed far enough geographically

-is tape changed daily and who is tracking that it is taken off site

– when is your IT department planning for scheduled downtimes