Data lifecycle management (DLM) is by no means unique to the modern IT industry. Indeed, humans have had to manage information ever since the discovery of “movable media” (early paper prints and books, or going even further back in time, stone tablets and papyrus scrolls, versus “fixed” media like cave walls). Clearly, DLM is independent of the underlying media and technology. It is instead a function of process.
Five steps constitute a modern DLM implementation:
1. Categorizing data types: Most organizations will have dozens of different data types, perhaps defined by the source: transactions, external communications, office documents, e-mail messages, legal documents, et cetera. Although each type could be treated individually, grouping them into categories (e.g., routine, operational, compliance) can simplify IT system design.
2. Relating business rules to data types: After establishing data types, governing business rules can be related to them. Examples of business rules are control (e.g., security), retention periods, protection practices (e.g., backup and recovery schemes), and compliance observance.
3. Determining service levels: Service levels must match business rules for a given data type, i.e., apply different availability requirements to different data-type groups. Consider two or three, but no more than four data type groups and service levels.
4. Establishing tiered services: Putting service levels before the storage architecture helps ensure that the product choices meet expectations.
5. Choosing appropriate products: Only after completing the previous four steps, decide which products to choose to build the specified system.
Product selection is the last step in the process, and not the first as some vendors advocate. The advantage of following the DLM process is that the last two steps, architecture planning and product selection, become relatively mechanical, rather than subjective.
BEWARE THE VENDORS
Given that the most significant effort associated with DLM involves process determination, either conduct the process-definition steps internally or engage a third party to do so. Although it may be acceptable and desirable to engage a vendor capable of delivering both services and products, very few are capable of adequately doing both.
Beware so-called professional services offerings that are little more than enhanced product implementation teams. You can discern which offerings are legitimate and which are not largely by the degree of independence between the process definition and product deployment phases. Have your processes defined before deciding on a product.
HARD WORK AHEAD
Tape backup/recovery alone is no longer a best practice. Disk-based recovery mechanisms such as mirrors and snapshots are now commonly available and represent current best practices for business continuity.
The infrastructure changes required to implement these solutions are significant. Additional disk storage is needed, as might be updates to backup applications that can recognize both disk and tape — and even new operational processes and staff retraining. This effort might be appropriate based on increasingly demanding service- level requirements.
Near-line/”disk library” devices provide an intermediate step between the primary storage and the tape system, reducing the impact of tape-related failures as well as permitting data restoration at disk speeds.
Too often, however, companies weigh the cost of the additional disk space needed for these technologies against either the cost of tape (media alone) or the cost of doing nothing. Instead, they must compare the cost of disk recovery technologies against the much greater costs of downtime and personnel required to protect and restore the data.
IDENTIFYING THE PROBLEMS
Relying on tape technology solely for data recovery increases not only personnel operating costs of an organization, but also the risk of data loss and unplanned downtime. In many ways, risk and labour costs are closely intertwined.
Since backup and recovery consumes nearly two-thirds of total worker hours devoted to storage management, that labour is an obvious target for the quickest, greatest cost reduction.
Smaller, remote offices often, do not have staff available to manage such operations. The result can be misplaced media, mislabelled media, failure to execute necessary operations, and an inability to troubleshoot and solve technical problems. Moreover, remote sites rarely seem to create duplicate media and move it off-site regularly. All these factors in combination make a lost- or unrecoverable-data situation nearly certain.
As a removable media, tape also represents a vulnerability to data security. Although most off-site vault suppliers are bonded and insured, many supplier contracts allow up to a five per cent loss of media elements without penalty (META Group best-practice dictates a less than one per cent loss).
Another problem is that since many companies require 24×7 availability, the backup window has literally disappeared.
Keeping data adequately protected in such an environment is similar to in-flight refuelling of an airplane that must continually circle the globe without touching the ground. Systems must be kept continually operational without negatively affecting the core mission.
Furthermore, backup and recovery systems are surprisingly difficult to optimize and keep in tune. As data volumes continue to grow at a 40 to 45 per cent compounded annual rate, tape systems gradually run out of capacity. Other tuning problems include inadequate tape drive speed and capacity, too little network bandwidth, low backup-server processing power, and a bottleneck at the disk drive controller. Any inadequacy of an individual component can result in backup failures, yet identifying and solving the problem can be time-consuming.
THE DATA PROTECTION CONTINUUM
Fortunately, risk and labour can be reduced simultaneously. This improvement is possible because the root cause of both items is failed backup jobs. Typically, a backup administrator must spend four to six hours per day identifying, diagnosing, and solving backup failures. Any data not protected by a backup obviously increases the risk of significant data loss.
For critical systems, the administrator might be compelled to restart the job during normal operations, potentially affecting production systems. As a best practice, we recommend that you should adopt a continuum of data protection with these key elements:
Snapshot: Snapshots, or logical metadata copies, can be taken frequently, every four hours, for instance, to protect against typical minor data losses such as a lost file or table space. Snapshots require relatively low disk space overhead (between 2 and 20 per cent per snap) but do not protect from disk subsystem failure. Each snapshot typically persists for eight to 24 hours.
Local mirror: Local mirrors are complete physical copies of critical data, such as information stored in a database, a storage array within the data centre. Because mirrors have 100 per cent disk-space overhead, they are usually created only once per day.
Remote mirrors: Remote mirrors are complete physical copies of the data to a separate system or site and are used primarily for disaster recovery. A single image of a remote mirror can persist in perpetuity.
Near-line/disk library: Near-line appliances, or disk libraries, house data images, generally created once per day, which are persistent for a month or more if the images are incremental.
Tape libraries: Tape libraries remain the best automated method for handling magnetic tape for traditional backup practices.
Although the death of magnetic tape has been forecast for nearly three decades, it still plays an essential role in IT operations. Tape remains significantly cheaper than even the least expensive ATA disk ($.0004 per MB for tape versus $.005 per MB for ATA disk). Therefore, tape remains the best choice for long-term off-site archival (e.g., more than seven years).
Moreover, it remains wise to use tape to provide a physical disconnection between disk replications and the “last chance” data image on tape to guard against malicious attacks or propagated data corruption. We believe that tape backup will remain part of a data protection best practice through at least 2009 or 2010.
Helmer is vice-president, strategic solutions at META Group Canada.