Hands up everybody who thought data archiving and backup were the same thing. Not many nowadays, perhaps. But five years ago?
The whole notion of archiving as distinct from backup, and the related issues of records retention and data retrieval, have forced their way into the CIO’s consciousness in recent years, thanks in part to the revolution in corporate governance regulations that began in the U.S., and in part to out-of-control growth in data storage requirements.
Estimates of enterprise data growth range from 30 per cent a year to doubling every 18 months. “I would say 95 per cent-plus of all business information in our organization is now contained in electronic files – in documents, e-mail and flat files,” says Mike Cuddy, CIO of Toronto-based Toromont Industries Ltd.
So what is the difference between archiving and backup?
Backup primarily answers IT department needs for recovery of data and systems in case of disaster, according to Umesh Hari, senior executive and global lead for data management and architecture at Accenture. Archiving and retrieval – the latter a vital but sometimes overlooked concomitant of the former – have more to do with regulatory and legal requirements for data retention and protection, internal business needs for on-demand access to historical data and the economics of data storage.
“Backup technology is not designed for on-demand reproduction of information,” notes Cuddy. “Nor is it designed for any kind of search – and those are key elements of archiving and retrieval.”
PUTTING A STRATEGY IN PLACE
Now hands up everybody whose organization has a full-blown data archiving and retrieval strategy in place. Not so many. And how about all those who have made a start on developing a strategy? All those who plan to start soon? Some day?
CIO Canada talked to several Canadian organizations to find out how they’re meeting the archiving and data retrieval challenge. The Greater Toronto Airports Authority (GTAA) is less than a year into implementing a full-blown strategy. The strategy is driven by the GTAA’s corporate affairs department, but CIO Gary Long is responsible for implementing the systems to support it.
“We look at archiving as involving a policy-based matrix of retention schedules and document characteristics,” says Long. “I’m not saying we didn’t before. The big change is the formalization of these policies.”
Toromont, a $1.8-billion-a-year construction equipment manufacturer, began implementing a similar strategy under Cuddy’s direction about six months ago. “Prior to this project, the mechanism for retrieval was the restoring of backups,” he says. “We used backup data as the archive. Now there is a very specific distinction between those two things.”
Many other organizations, however, have barely begun – in fairness, often for good reason.
But does every organization need an archiving strategy, or dedicated archival storage?
In industries governed by U.S. Securities Exchange Commission (SEC) regulations or similar Canadian rules on business records retention, the answer is clearly yes. Securities and commodities firms must retain all data about business transactions, including even telephone conversations. And they have to be able to produce it within 48 hours and prove it hasn’t been tampered with. Companies in those industries were pioneers in implementing archiving strategies and installing dedicated infrastructure.
Few organizations are completely exempt from records retention regulations, though. For example, all companies must retain financial data for at least seven years. All must retain human resources (HR) records indefinitely. Some are subject to contractual rules. The GTAA, for example, under its leasing contract with the federal government for Toronto Pearson Airport lands, is required to retain all business records related to the lease for its entire 60-year period. Contravening such regulations can result in financial penalties.
Even where there are no regulated retention periods, it’s prudent to archive and protect data against the possibility of it being required in judicial proceedings. If there is any suggestion you altered or destroyed electronic records requested by the other side in litigation, for example, the court’s judgment will more likely go against your company. Destroying records that might incriminate or reflect negatively on a company – in a merger or acquisition, for example – can earn jail time.
Legal and regulatory obligations to retain data are crucial drivers, but the need for quick access to historical data may ultimately be more important. Many kinds of business intelligence analysis, for example, require ready access to old data. And just being able to lay hands on no-longer-current information and documents can sometimes be crucial to business success. Ensuring easy access to archival data is an area where many organizations are less likely to be doing a good job.
“A lot of companies are doing the minimum required to satisfy [data retention] regulations,” says Accenture’s Hari. “But they’re not spending enough money yet on the retrieval side.”
While regulatory and legal imperatives certainly helped drive Toromont’s archiving initiative, the company was motivated as much or more by the need for quick access to archival data, Cuddy says. “In the past, it might have been acceptable to take two or three days to reproduce information, but not any more.” If you have to wait that long to retrieve data needed for a customer proposal, for example, you could miss the opportunity, he says.
Another key motivator in many organizations is the storage crunch. Primary storage, typically in high-speed storage area networks (SANs), is still expensive, even though prices continue to tumble. Long estimates the GTAA’s data storage requirements are growing at about 30 percent a year – it currently has 200TB online. Storage prices, meanwhile, are only falling at 20 per cent a year. “So need is still outpacing the improvement in cost,” he says. “That means you’re still looking at a cost wall. It’s hard to see where it will end.”
Most organizations can’t afford to save everything indefinitely on primary disk storage. This is the problem that Glendon College, the French-language campus of York University in Mississauga, Ontario, now faces. “We don’t have policies on how long to retain information,” admits director of information technology Mario Therrien. “It’s always been as long as we could afford to keep it. Now we’re finding, with the amount of information we’re generating, that it’s getting to be a problem. We need to be smarter about what information we keep and how we store it.”
GTAA PLOTS STORAGE STATEGY
So how are Canadian organizations meeting these challenges?
At the GTAA, the process started with planning and policy making. How long do different categories of data need to be retained, given government and other regulations, legal best practices and business process needs? And how quickly do you need to be able to retrieve data later?
The GTAA hired a consultant to help it develop a matrix that lays out about 20 broad categories of document – for example, financial management, business support, engineering and design – and many subcategories under each. Each subcategory has a classification number attached to it which determines the retention/destruction schedule and other characteristics such as security clearance and privacy requirements. “The retention schedule will differ even within major categories,” Long notes.
Classifying and adding metadata to existing documents, including paper documents, will be a labour-intensive job expected to take 18 to 24 months. But the GTAA’s plan is that going forward, document authors will add the classifications and other metadata as they create documents, using document management software from OpenText.
Hari says this is one of the most difficult and expensive parts of implementing an archiving and retrieval system. Many organizations trip up because they can’t enforce policies requiring employees to classify documents as they create them. However, Long says that in some areas, including government and the sciences, classifying documents as they create them has become part of routine for employees. Besides, at the GTAA, non-compliance by employees could ultimately lead to firing.
That said, he adds, “No one should underestimate the amount of behavioural change that is required to make this work.”
Why go to so much trouble? Classifying documents and adding metadata will simplify searching for them later but, as or more important, it will allow the GTAA to automate some of the data lifecycle management process and retention/destruction policies. When a document reaches the end of its retention period, for example, the system will send an alert to the appropriate authority. “It’s not just going to get destroyed, though,” Long hastens to add. “It gets identified [as expendable], then its destruction has to be signed off on.”
The GTAA is also moving towards a tiered storage system with primary and less expensive but slower secondary disk storage, plus tape archives. Almost all of it will be online, including the tape archives, which will be stored in a robotic system that can mount the required tape and access the requested document without human intervention. The GTAA is using infrastructure equipment and software from Hewlett-Packard, including its Reference Information Storage System (RISS).
The idea with tiered storage – a central concept in data lifecycle management and archiving – is that the value of the data and how quickly it will need to be retrieved determines the type of storage used. The most valuable and highest-demand data on primary storage, the least valuable and lowest-demand data on tape. In an automated tiered storage system such as HP’s RISS, the system determines which type of storage to use based on a document’s classification, and automatically moves data off primary storage as it ages.
Hari estimates that the ratio of costs for tape, least expensive disk and most expensive disk is approximately 1:10:50, so considerable savings are possible from tiering.
TOROMONT EXPLORES E-MAIL ARCHIVING
Toromont has gone through a similar process of establishing a framework that identifies and classifies different types of data – documents, e-mails, e-mail attachments, data on servers – and categories of business document. The framework defines ownership for each, as well as security/privacy and archiving/retrieval requirements.
The company is focusing first on finding solutions to satisfy archiving and retrieval requirements for e-mail and will turn to other types of data later. Currently, Toromont only archives e-mail in backups. “There may be cases where that’s appropriate,” Cuddy says. “But there are many cases where backups do not meet our requirements. Part of this has to do with how fast we need to get the data back.”
There are basically three different types of e-mail archiving solutions: licensed software implemented on a standard server, all-in-one appliances that include software and integrated storage to which all e-mails are sent for archiving, and outsourced solutions that capture and archive e-mails offsite and provide online access. Toromont looked at 15 vendors. It ultimately discarded as too expensive the first option and is now considering one appliance solution from Jatheon Technologies Inc. and one outsourced solution from Fortiva Inc.
There are a couple of concerns with automated e-mail archiving. One is privacy. E-mails are being sucked out of employees’ Exchange mailboxes almost as they’re being created and stored for as long as ten years. Employees need to understand the implications of this – that sending personal e-mails on a company account may not in some cases be the wisest thing to do.
Another concern is security. In order to comply with governance regulations, it must be possible to prove that no one could have tampered with data. Also, if e-mail is initially encrypted, it needs to be decrypted to make it searchable in an archive, making it more vulnerable. So you need to be very confident about the security in place. No wonder that Fortiva, the archiving outsourcer, chose the name it did and the logo (a castle turret) and boasts in its marketing literature that customer data is kept “Fort Knox secure.”
GLENDON COLLEGE PONDERS DATA ISSUES
In top-down organizations, like Toromont and the GTAA, implementing a far-reaching archiving and retrieval strategy is by no means a cakewalk, but it’s at least easier to do than in organizations with more complex, less hierarchical structures.
At Glendon College, for example, the main challenge Therrien faces is the growing mountain of data stored by faculty related to their research and teaching, much of it in e-mails. Budget will not permit the current rate of storage capacity growth to continue for long. Also, because so much data is being generated, it’s increasingly difficult for Therrien’s group to get all the backing up done during periods when systems are little used.
Bottom line: the college needs to store less in general and less in primary storage in particular. But what to jettison? It will require a process of consensus, Therrien says. “We’re engaging the community to find out what’s reasonable. We have to try to strike a balance between cost and level of service. We’re not there yet.”
In the meantime, he has laid some of the ground work, installing a scalable Sun Microsystems’ StorEdge SAN system to replace server-based storage. He’s also exploring Sun’s virtual tape library solution. It uses lower-speed, less expensive disks in place of tape for archival storage, making data more readily accessible than it is on the current, out-dated tape backup system. This might persuade his user base that they can let more of their data move off primary storage.
NS DEPT. OF HEALTH OPTS FOR TIERED STORAGE
The Department of Health in Nova Scotia faces similar organizational challenges and budget constraints. To implement any new archiving and retrieval solution, the department must gain consensus among eight semi-autonomous regional health authorities. It has needed some education and training to achieve that, says Leigh Whalen, director of technical services for Health Information Technology Services Nova Scotia (HITS-NS). “A lot of people still think that if it’s backed up, it’s archived,” Whalen says.
The department has so far succeeded in bringing in a province-wide solution for only one category of data – digital medical images (mainly x-rays) generated by the new PACS (Picture Archiving and Communication System). With medical images no longer in hard copy, physicians were concerned images might be lost or inaccessible. Saving all images indefinitely on primary storage was out of the question, so Whalen’s group helped develop an archiving solution using EMC’s Centera content addressed storage (CAS) product.
Now, PACS images are stored on primary storage at hospitals for one year only, then are automatically off-loaded to redundant data centres and stored on lower-cost disk-based storage. But they’re still available online. When physicians go looking for an image, they don’t need to know where it is – this is a feature of automated tiered storage systems in general. According to Whalen, the only difference they see is in the time to retrieve – 20 to 40 seconds, compared to 10 seconds for primary storage. Still more than acceptable.
Most of the organizations we talked to are still in the early stages of implementing data archiving and retrieval strategies, but all understand the imperatives. They’re fairly clear.
The volume of data organizations generate continues to escalate. New governance standards, legal best practices and business process needs dictate that more of it must be retained – and very securely – for longer periods. And it must be readily accessible. Using backups, especially tape backups, for archiving is no good because it take takes too long to retrieve data. Saving everything in expensive SAN storage is not economically feasible.
The answer is to develop clear policies on data retention and retrieval requirements, and implement a tiered storage strategy that sees only the most important and in-demand data stored in primary disk storage, with everything else relegated to cheaper, but still readily accessible, archival storage. Simple enough to state, more difficult to execute. 076776
Gerry Blackwell is a freelance writer specializing in technology and IT management. He is based in London, Ontario.