As CIOs scramble to update their disaster recovery plans in time for the Y2K changeover, many are discovering just how vulnerable their corporate data has become – and how disastrous even a minor data loss could be. Corporate information, once relegated to the secondary role of business support, has become a key business driver and competitive asset, as important as cash and inventory to the day-to-day operations of the business.
While the cost of downtime and data loss continues to rise, the chances of a potentially disastrous failure occurring have never been greater. The Y2K issue may be receiving the lion’s share of media attention, but distributed computing environments remain vulnerable to myriad threats, ranging from natural disasters to physical threats such as fire, water leakage, power surges, wire breaks, or theft. Even errors caused by viruses, programming mistakes and upgrades can be disastrous to a company’s bottom line, and ultimately their valuation.
Despite the obvious importance of ensuring data integrity, safety, and recoverability, IT executives have proven reluctant to devote significant time and resources to what is widely considered a mundane administrative task. In this regard, the Y2K crisis may prove to be a blessing in disguise. Given the critical nature of distributed business processing, storage management can no longer be relegated to the departmental level. IT managers need to lay the groundwork for a centralized approach to enterprise storage management that enables recovery of applications and their associated data, based on business policies.
STORAGE MANAGEMENT ACROSS THE ENTERPRISE
The distributed environment presents a tremendous challenge to IT administrators attempting to protect corporate data and ensure the recoverability of applications. Data from distributed business applications may be stored anywhere on the Enterprise Information Grid, defined as the entire information-carrying infrastructure of the company, including servers, desktops, mobile devices, backup storage devices, and the LAN/WAN connections between them. Moreover, data moves fluidly across platforms – for example, from a PDA used by a field operator to a mainframe-based transaction processor – requiring coordinated backup on both sides of the application in order to recover the application without losing data.
Compounding the challenge, the sheer volume of data stored on the typical enterprise network continues to grow at an exponential rate, fuelled by storage and a host of powerful applications from ERPs to groupware, e-mail, multimedia applications, and the Internet for such important functions as e-business. The data is stored in a wide range of formats, and on a variety of media, including hard drives, tape drives, optical storage, and offline media. New storage technologies such as Storage Area Networks (SANs) promise to enhance the storage architecture, but present their own management challenges.
ALIGNING STORAGE MANAGEMENT WITH BUSINESS PRIORITIES
Developing an effective storage management strategy requires a thorough understanding of the business processes that take place across the Enterprise Information Grid. As Rick Ross, a partner in the consultancy Total Solutions Group, explains: “The first thing is to determine your business need for availability. What’s the impact of being down for an hour, day, week, or month? And do it application by application.” Without this understanding, managers have no way of knowing which resources or processes are vital, and which are less critical.
Once the availability requirements for an application have been defined, it’s a relatively simple matter to establish a storage management policy based on these requirements. For example, a critical e-commerce application for which downtime is assessed at a very large amount of money per hour may require a business policy of automatic fail-over to a second version kept online, with no data loss considered tolerable. Such a policy then dictates the technology that will be required. In many cases, the business policy will be based not only on the internal business requirements, but also on externally imposed requirements such as Revenue Canada rules for record retention.
In many cases, the storage management policy for a particular application will include interim steps for moving the data between storage media. Kodak is creating a new paradigm in the competitive photo industry. The photo-processing giant guarantees customers that its online, digital-photo developing service will have images available online for a certain period of time. Then they will be archived and stored by Kodak for an additional period of time. The storage management policy would reflect this progression in addition to the backup and recovery requirements.
IMPLEMENTING A RECOVERY-BASED BACKUP STRATEGY
Achieving backup based on centralized recovery requires a combination of a consistent, clearly articulated and consistently enforced strategy, combined with management tools to simplify administration. Without a clear strategy based on business imperatives, individual departments and users will inevitably undermine the centrally established policy by following their own procedures, which may not be in the best interests of the company. Indeed, they may not even be sufficient to ensure basic data security and integrity.
Robust management technology, however, represents an equally crucial component of a recovery-oriented solution. Without it, administrators cannot even ensure that data on the Enterprise Information Grid has been reliably backed up, much less enforce corporate policy.
David Boonstra, IT Manager for Boeing Canada’s Winnipeg Division, explains: “We used to do all our backup to eight-millimetre tape stand-alone systems, one tape at a time, and we used to run a Unix batch scheduler like CRON. What we found is that by doing it that way, we didn’t have a good handle on whether the backups were good, whether all the data was there, and whether it had finished; there really was no visibility.”
Automation technology becomes even more important when disaster strikes. According to Boonstra, recovery of major applications in the event of system failure (“And we’ve had several,” he notes) required sequential restore of incremental backups from a number of tapes, and typically required days to complete. Such long periods of downtime had a clear impact on the bottom line: “Because we’re a manufacturing facility, any downtime for some of our major systems is downtime to production.”
Enforcement of a centralized management strategy in conjunction with automation technology has reduced his time to recover a major application from several days to, in one recent instance, eight hours. “And that is from the time we contact the supplier of the system, get in the new parts, to the time it is back online running, with new parts installed, integrated, and brought up to speed, with everything loaded back on.”
Finally, automation technology frees administrators by empowering end-users. Intab Ali, Director of Computer Facilities and Operations at Dalhousie University in Halifax, explains: “Users can decide when they want their backups to be done, and when the time comes their machine is backed up. It doesn’t require any human intervention to do any backup or restores.”
At Boeing, programmers now routinely recover their own files, which not only reduces the workload of administrators but also speeds application development, because programmers no longer are forced to wait for files to be recovered from Boeing’s offsite location.
ELEMENTS OF A RECOVERY-BASED STORAGE MANAGEMENT SOLUTION
To implement and enforce a recovery-based storage management strategy, IT needs to consider the following strategic and technology elements:
Application-Focused Data Management: Today, most backup and storage management is handled at the departmental level, with a wide variety of tools employed to back up different hardware and OS platforms. While this ensures that a copy of the data is available, it does not suffice to ensure timely recovery in case of application failure. Distributed business applications tend to run across platforms, across operating systems, and across departments within the organization. In order to recover an application, administrators may have to initiate restores from many different sources, then piece the appropriate files back together to recreate the application.
A recovery-focused management solution enables administrators to automatically recover a business application based on pre-established policy, independent of the underlying point of failure, and independent of the hardware, operating systems, and physical location of the application data. Moreover, by recovering only the data associated with the failed application, rather than requiring a full restore of a set of backups, such a solution speeds recovery while reducing network resource usage.
Centralized Oversight of the Enterprise Information Grid: Because data (and backups) may be located anywhere on the network, administrators need a single view of data across all hardware platforms – MVS, Unix, NT, desktop, PDA, etc. – and all storage media, from SCSI disk and tape storage to multivendor SANs. This administrative view should associate data with applications, making it easy for administrators to view and manage stored files without requiring knowledge of the underlying hardware or operating system. In addition, administrators should be able to logically define and execute storage management polices from the central location – for example, defining rules for archiving and deleting files from the corporate e-mail system.
Fast Backup/Restore Technologies: As business processing extends beyond normal business hours, the window of availability for performing backups has shrunk drastically, forcing administrators (and storage management utility vendors) to back up ever-greater amounts of data in ever-shorter periods of time, putting tremendous strain on network and storage resources. To address this problem, storage management vendors have developed a range of technologies to maximize the use of network bandwidth and facilitate backup and restore. These include such established technologies as multithreading, compression, differential backup, online database backup, and hierarchical storage management. In addition, several new technologies are now becoming available, including adaptive mapping technologies to facilitate the logical association of data and applications; and byte-and-block level differencing, which will enable differential backups at the level of individual bytes and blocks of data, further reducing bandwidth requirements.
Integration with Systems Management Tools: Part of the difficulty of maintaining effective storage management is the necessity of keeping up with the flow of data in a constantly evolving IT environment. By integrating storage management with other systems-management disciplines, administrators can set up their solution to automatically implement storage-management policy in response to changes. For example, when new desktops are added in the accounting department, they could be automatically included in the policy that dictates that accounting desktops are backed up to a central location every night. Similarly, integration with help desk could enable support staff to receive automatic notification of the status of recovery jobs, and to automatically request recovery of files at users’ requests.
Data Mining for Decision Support: One of the great advantages of centralizing storage management is the ability for CIOs and other IT managers to perform trend analyses from a comprehensive data set in order to fine-tune decisions on storage resource-allocation and spending. Given the speed with which network storage volume is increasing, CIOs will almost certainly be spending large sums of money not only on storage devices, but also on associated management. (It is estimated that the cost to manage storage is four to six times the cost of acquiring storage.) By mining data gathered centrally, managers can identify where existing storage resources are being underutilized, or whether a backup hangup is related to bandwidth or capacity. This query-and-analysis functionality will become even more important with the widespread deployment of SANs, which must be fine-tuned for optimal resource-sharing and fast recovery.
Testing: Finally, for a storage management solution to be considered effective, its ability to restore must be verified through a certification system that demonstrates end-to-end recovery. Because the enterprise IT environment is constantly changing, testing must occur on an ongoing basis, to maintain readiness, familiarize staff with recovery procedures, and identify potential glitches before they interfere with a real recovery.
The volume and diversity of corporate information on the network will only continue to increase in the coming years. The Y2K changeover offers CIOs an unprecedented opportunity to address storage management in a comprehensive way by looking beyond mere disaster recovery and laying the groundwork for centralized, policy-based storage management that focuses on recovery of business processes, not just files.
By taking this opportunity and making the business case to top management, CIOs can ensure they will be able to meet the future storage demands – including future storage technologies such as SANs – and, much more importantly, recover gracefully from the inevitable failures with minimal impact to the bottom line.
Jim Hilbert is Vice President and General Manager, Storage Division, Tivoli Systems.