As the CIO of a large organization, you’ve made a significant investment in a distributed computing environment in order that essential information can be shared over enterprise-wide networks. From the factory floor to external Web sites to internal personnel, your people and customers are learning to work together more efficiently, competitively and profitably.
Imagine then, if suddenly everything that has united your enterprise is lost as a result of a serious systems malfunction. In an environment where even a single error caused by a natural disaster, Internet attack or some other unplanned event can impact your business, steps should be taken to minimize the risks. The problem is that in many organizations, these critical steps are not being taken.
In June of this year, The Yankee Group, a Boston, Mass.-based consulting firm, issued a research note describing the potentially devastating effect record high temperatures had on businesses in the San Francisco Bay Area. “Last week,” the report stated, “Silicon Valley came face-to-face with yet another challenge of doing business – Mother Nature. Rising temperatures pushed California’s Pacific Gas and Electric Co. (PG&E) into a Stage 3-level emergency, when fast-depleting electricity reserves threaten the supply of power to all customers. The weather caused PG&E to initiate intermittent and rolling outages of electrical power. The interruption of the power supply threatened the lifeblood of the high-technology industry.”
This incident highlights what the Yankee Group has been counselling for years, namely that business continuity and disaster recovery are services that require extensive thought and comprehensive implementation. Each business continuity plan should be designed to meet a firm’s business requirements. For example, some reports have indicated that the cost of being down approached $1 million per hour for some firms.
In less than a decade, the importance of end-user recovery has grown immeasurably. As recently as five years ago, it was more than sufficient to have a recovery plan for the main CPU utilizing a “hotsite”. With the advent of client/server environments, companies saw the need to recover the many diverse platforms and servers that were spreading across the organization.
Then as now, an organization’s vital operations, whether part of the public or private sector, were as vulnerable to disaster as the human body is to bacteria, virus and disease. External or internal forces, including the weather, computer failures, human negligence or some unforeseen cause can result in an interruption of critical business functions that can cost an organization dearly. It goes without saying that your organization needs to be proactive in the planning of any calamity that may occur. Less obvious is how much effort an organization should put into this kind of planning and on what basis should this decision rest.
Two Traditional Options
The advent of end-user recovery has traditionally been handled in one of two ways, the first being that companies had the option of sending a fixed number of users to a hotsite. Recovery vendors responded by providing space at their major centres. Each end-user seat provided a work area, client PC, telephone and access to the network. As user requirements grew, it became more important to have additional capacity, however vendors have often found it fiscally impossible to continue to add idle space to their locations. Companies could rarely go beyond 150 seats before having to split their personnel between two cities or recovery locations.
The second option was for a company to use their own spare facilities, and have equipment shipped in at the time of the disaster. Although this limited the amount of money spent up-front for hotsite-type facilities, companies found similar problems in regard to lack of capacity and transportation of personnel. Also, the adage “nature abhors a vacuum” takes on a whole new meaning in an age of cost cutting and downsizing. Spare facility space rarely stayed vacant and available.
Mobile recovery has been available for many years, usually providing for “data centre” type recovery for midrange and LAN server environments. While this addressed the main problem of traditional end-user recovery, namely keeping end users local and close to home, until recently, large requirements could still not be addressed.
When a company wanted to develop a mobile end-user solution, the vendor would provide the work area, technology, and data/telephone/power hookups. Until recently, it was up to the company to develop details of the plan in terms of logistics and network connectivity. CIOs soon realized that the project was a complex planning process requiring an abundance of research, legwork and preparation. More often than not, IS personnel ended up spending inordinate amounts of time away from their “real” jobs creating a strategy for recovery.
It’s a New Game
The rules have all changed in today’s networked environments. The shock wave of disaster travels immediately outward to engulf end users, other operations, strategic business partners, suppliers, customers and ultimately consumers. Even if redundant systems exist away from a disaster site, employees who cannot function because their systems are destroyed, down or inaccessible will blunt a business’s edge. If sales and customer service functions are compromised, or if contractual obligations are not being met, a business can falter.
The threat of an IS meltdown is even greater as a result of the e-business revolution, which has forced the CIO community to come to grips with the risks, rewards and returns of operating in 24-7 mode with absolute minimum downtime throughout the course of the year. In the pre-Internet age, you might have been able to get away with a system crash lasting 24 or 48 hours because the business world operated under a different set of expectations.
There are clear consequences any time an organization’s information systems shut down long enough to impact quality service. The immediate effect is on productivity and profit. The longer-term results can affect management, shareholders and even threaten the entire business.
Today, the challenge around disaster recovery is twofold – recover the technology and enable the personnel that use the technology to deliver service to the customer.
In terms of business-to-business (B2B) or business-to-consumer (B2C), different needs develop depending on which segment an organization is involved with. B2B relationships tend to be much more rigorous – backed up by a lot of agreements, contractual arrangements and an ongoing relationship that could survive a minor outage. On the B2C side, there isn’t that luxury. If consumers go on a Web site and are unable to get the product or the answers to their questions, they will be gone and they won’t come back.
The whole trend is towards a much faster recovery timeframe and much higher level of availability to the business. CIOs who have dealt with recovery-type requirements in the past are finding it’s no longer acceptable to be back in service within 24 or 48 hours; systems have to be available continuously. From a purely technical perspective, it means operating costs will be higher.
The biggest challenge is coming to grips with the risks, rewards and returns of being available 24X7 with absolute minimum of downtime throughout the course of the year.
The number one concern facing the CIO is attempting to keep up with the changing business methods. In the legacy days it was far easier. An organization had a mainframe and used that for batch processing of transactions and generating financial statements at the end of the month. The customer-to-service contact point was the telephone or a sales representative calling on the customer and manually writing an order that was entered into the system. Today, technology supports an environment where deliveries take place immediately.
The winners are those organizations where the wall that once separated a CIO and his or her IS department from the rest of the business crumbled long ago. In these cases, business decisions are made based on what technology is capable of delivering.
Where’s the Back-up?
Sadly, there are many organizations that have implemented systems designed for the New Economy without seriously considering the need for a sound data recovery strategy. It’s astounding that so many systems are put in place today without a lot of thought about back-up or recoverability issues. It is estimated that as many as 80 per cent of the new systems installed today are implemented without much thought about the backing up of data or recoverability.
There are reasons why this happens. Not only is there a tremendous shortage of technical resources but there is also an overwhelming need to keep up to the competition. A CIO may say, “If we don’t put the system in place we’re going to lose ground to the competition so let’s get it up and running and worry about the other aspects later.” They adopt an ostrich-like approach by losing sight of the fact that a true high-availability environment costs twice as much as the stand-alone environment they recently replaced. There have been cases where CIOs have turned down implementing a disaster recovery program because of the cost involved. This happens despite all the statistics that show that if they had a catastrophic failure, the business loss would be well beyond what the cost of implementation would be. It’s a cultural issue and one that many organizations are forced to come to grips with.
It’s like buying a new car and then waiving the insurance. In these cases, the CIO needs to realize that it truly is only a wire that separates the end-user consumer of goods and the manufacturer or the supplier of those goods.
Need for DR Strategy
How important is it to have a secure DR strategy? Very important. And keep in mind, the “hotsite” is just one piece of the puzzle. If you’re involved in determining what your real needs as an organization are, you may find it is impossible for all of your clients to be recovered immediately, but possible for your servers to be restored more quickly. It becomes a case of strategy.
Clients can be defined as the human part of the end-customer business interface: the person sitting out there in the call centre operating a PC and complex phone. If your call centre is located in New Brunswick and the servers are located somewhere else, there are two potential sources of massive failures – the threat exists of losing the call centre and/or losing the server. Each of those scenarios would call for a different type of disaster recovery strategy.
Dana Cameron, the disaster recovery and business continuity coordinator at Sobeys Inc. in Stellarton, N.S., certainly understands the importance of the client when it comes to having a plan in place. Founded in 1907, Sobeys, which became a public company two years ago as a result of a merger with The Oshawa Group Ltd., is a national retail food distributor with projected sales this year of $11 billion and over 32,000 employees. The company sells private label and national brand name products through 1,400 corporate and franchised grocery stores and also runs a one-million member customer loyalty program that offers electronic coupons, frequent shopper bonus points and assorted other benefits. Sobeys lifeline is a complex data centre based in Stellarton.
Sobey’s Modular Recovery Plan
“Sobeys has a very aggressive and comprehensive disaster recovery and business continuity plan in place that is exercised several times a year,” says Cameron. “Modular space is only one of many in the whole scheme of solutions in a corporate-wide plan. This particular solution would be exercised only in the event that we lost one of our offices, or access to that building, in the Stellarton area due to fire, chemical spill, or vandalism, etc.”
In Stellarton, there are four office structures, all within one mile of each other. In the event that Sobeys loses access to one or more buildings, they would initiate a set process that would immediately have the modular space shipped to a designated site. The modular space is simply a group of mobile-home type structures that can be set up in pods of three or four. These units are shipped out of Halifax and Montreal and contain full access to data processing, telecommunications and all other business functions.
With networks the cornerstone of today’s business, rapid disaster recovery has become a vital part of the disaster-planning matrix. Despite that, according to the Gartner Group, through 2003 only 25 per cent of large enterprises will leverage their year 2000 planning efforts and improve the overall quality of business continuity programs and plans.
A Frank Warning
In a strategy paper entitled Year 2000 Aftermath: Too Much Contingency Planning, the Stamford, Conn.-based consulting firm issued a frank warning. “Like century-date change preparedness, enterprises should be ready for any type of business interruption including fires, natural disasters, hardware and communications failures, and the failures of key outside service providers and trading partners. These types of interruptions cannot be predicted. Moreover, they can have devastating effects on enterprises, including eventual insolvency.
“With the increased speed of business resulting from electronic commerce, increased competitiveness, innovation and reliance on IT, recovery from a business interruption must be quicker than ever before. Otherwise, the enterprise faces serious consequences, including loss of current and future revenues as well as customer and supplier confidence.”
We live in an age of shrink-wrapped, pre-packaged solutions. They are fast, easy and economical. Unfortunately, we often sacrifice specialization and a true fit for our needs. In your recovery planning, the one factor that remains imperative is the one factor too many organizations are willing to discard: customization.
The term “disaster” calls to mind such calamities as Ice Storm ’98, Hurricane Floyd or the Great Northeast Blackout. There’s no question that these events should be classified as such, but from a business perspective disaster doesn’t need to be as dramatic. In reality, a disaster is anything that interferes with an organization’s ability to deliver products and services to customers for an unacceptable period of time. When that occurs, there is only appropriate response: quick action in order to minimize the risk.
There really is no other choice.
Stan Krupp is Director of Sales for Business Continuity Services for GE Capital IT Solutions. He can be reached at [email protected]