IT users’ desires and business executives’ priorities are often at odds. A corporate storage strategy traditionally emphasizes data protection (backup) in favour of performance. Employees increasingly use – or misuse – e-mail systems as evergreen, searchable databases, which is driving 61 per cent of new storage purchases, according to the 2003 InfoWorld Storage Survey. The survey shows that users’ impatience is winning out in general. Respondents say that 67 per cent of corporate data must be maintained online, whereas only 17 per cent of data can be moved to offline archives. Yet however much users hate to wait for data to load from an optical library, chief technologists’ budgets are too slim to store every bit of data ever generated for immediate access. Can a combination of technology and policy create a storage strategy that cost-effectively preserves data and satisfies users’ need for speed? InfoWorld (U.S.) writers P.J. Connolly and Tom Yager express their views on the subject.
Tom Jager: I don’t think we can bridge the gap between users’ impatience and IT’s business objectives yet. When I looked to see how I’ve used nearline and offline storage over the years, I found plenty of archived data in my storage unit, including stacks of Syquest and Bernoulli cartridges. I have a big box of hard drives – some labelled, some not – filled with archival data. I could throw dice to guess the OS, file system, and jumper settings needed to read from these drives if their motor lubricants hadn’t turned to epoxy by now. I filled one five-device RAID array with a commercial video project, but the vendor that made the array is long gone and there is no driver for its controller card. I have one 12-inch data Laserdisc – I don’t even know where to start with that.
I’m a zero-patience user, a poster boy for random access. I would have cheered if my employers’ IT departments decided to replace tapes with online media. I have tried every alternative to tape I could find, thinking more about how long the backup and restore processes take than how long the archive will last. And now I have lots of data I either can’t recover or wouldn’t invest the effort to recover. After years of experimenting with all kinds of media and fancy storage management schemes, now I stream everything I care about to DLT (digital linear tape). I have a sense of what I’ve lost, and I now believe durability trumps convenience in an archiving strategy.
P.J. Connolly: I can relate, Tom. I still have to salvage data from some single-sided, original-Mac-format disks. But we’re not talking about obsessive data-retentive types like you and me – we’re talking about corporate data that must be managed. That implies retiring obsolete data, and by extension, obsolete data formats.
I have nothing against tape, but I can see why it’s not always a practical choice when a company has terabyte upon terabyte that’s not all in the same place. Besides, when survey respondents report that 68 per cent of their data needs to be online for immediate access today (a figure that will cross 70 per cent in the next year), that pretty much rules out tape-based storage.
Pull out of your silo and smell the coffee, dude. Disks and Gigabit Ethernet are so cheap today that there’s no reason admins can’t back up everything in sight. Face it, backup and disaster recovery are two of the three activities most likely to drive a company’s decision to spend scarce cash on storage.
Tom: I do tend to work in a silo, but my botched personal attempts to archive my data were usually the only credible effort around me that was made. I’ve worked for only one company that used HSM (hierarchical storage management), and it was deployed on servers housing medical records. All other files and databases were backed up weekly or monthly, and the media were never seen again. Nobody ever checked to make sure tapes were readable. The legacy devices needed to read from retired media types weren’t maintained.
Most of the IT archive warehouses I’ve been in are just grander versions of the mess I found in my U-Stor unit. Admins don’t need to reduce their use of tape or impose restrictive policies on users. Companies could archive solely to tape if they used tape properly, and users would accept waiting for old files if they could get them at all. Until then, they’ll burn their data to CD-Rs (the nearline flavour du jour) because they know the company doesn’t have backup covered.
P.J.: Excuse me, but how many Joe and Jane Users have access to a CD-R in the ordinary course of their workday? But you’re right about HSM being an unfulfilled promise; less than a third of our respondents currently use it, whereas only another 11 per cent have firm plans to buy HSM this year.
Tom: My advice is to avoid using proprietary tools, weird formats, and faddish media. I wouldn’t bet a nickel on being able to read a DVD-RAM disc or a cartridge hard drive written in a proprietary compressed format 10 years from now. Why does that matter? Companies may get their e-mail, personnel, and transaction archives subpoenaed. Emerging BI (business intelligence) tools will analyze and repurpose seemingly stale data. Programmers can dig gems out of retired source code. Businesses will find uses for data that executives can’t dream of today, but IT’s archive strategies focus on short-term recovery. The likelihood of long-term readability falls off sharply when hacks such as encryption and multi-drive striping are enabled. Archives created by complicated or proprietary automated systems die with the vendor that created the system.
P.J.: Now you’re talking sense. Anything that makes data recovery more complex invariably reduces the likelihood of a successful retrieval. That’s why standards-based file systems are the only way to go for random access devices, be they magnetic or optical in nature. Companies can’t afford to hire humans to replace tape-loading robots, nor should they try to do so. Instead, the best course is to go with the cheap disk/fast network approach, giving users the advantages of native file system access while allowing IT staff to be more than mere data janitors.