With the advent of inexpensive disk-to-disk backup systems that offer faster, easier, and more reliable backups and restores than most tape systems, many administrators would like to abandon tape altogether. However, a standard schedule of one full backup per week plus nightly incremental backups uses up a lot of storage space, the kind of space that only tape traditionally offers at a reasonable per-gigabyte cost.
Data Domain aims to solve this with the DD460 Restorer. The appliance appears on the network as a standard NAS device. When a backup application writes data to the DD460, it scans for patterns between the incoming data and data already saved to disk. When it finds duplicates, the DD460 inserts a pointer to the original block rather than saving the data yet again.
I found that this approach yielded a compression ratio of as much as 455-to-1 when performing a backup of data that had changed only slightly since the original backup, which means this 4TB appliance realistically stores 85TB worth of backup data. This high level of compression means that administrators could perform full backups every night without requiring much in the way of additional storage space.
The DD460 provides two levels of data compression. The initial compression, called global compression, generally provides about 2-to-1 ratio compression. The other level is local compression. This approach uses proprietary Data Domain technology to find identical strings of data and yields far higher compression, even on an initial backup.
In my tests at the Data Domain labs, we backed up several types of data, both from a Linux system, using tar, and from a Windows system, using Veritas Backup Exec. The data included a 9.4GB set of Oracle database files, a mix of standard files that would be typical of a file server, and a large file that was all zeros.
The 9.4GB of Oracle files became 4.6GB after global compression, and then 216MB on disk after local compression. When I backed up the same 9.4GB of files a second time, they used up an additional 20MB of disk space. All this compression occurred in real time while files were being backed up at 70MBps.
The set of mixed files was initially 3.2GB, which became 2.5GB after global compression and 1.3GB after local compression. After some of the files in this group were changed, a second backup used an additional 11MB of disk space.
The large file of all zeros produced the most dramatic compression ratio. The file of all zeros was originally 10GB and occupied about 1MB on disk, a total compression ratio of 10,334.5-to-1.
The average compression ratio for all data in my tests in a first backup was 8.7-to-1; on the second backup of the same data, after changes had been made to some files, the average was greater than 400-to-1 compression.
The system is very well engineered physically, with clean airflow and a nicely designed set of rack-mount rails that should work on pretty much any manufacturer’s rack.
Setting up the system is very simple, with DHCP supported for the initial network configuration. The quick-start sheet even includes MAC (media access control) addresses to ease DHCP setup. You can then use either a serial terminal or an SSH3 secure session for initial setup. Administrative tasks can be performed via serial terminal or SSH3.
The DD460 uses a Linux kernel from which extraneous functions have been pruned for added speed and security and to which necessary drivers and other modifications have been added. The system shows two partitions: /var, a utility partition that holds boot images, configuration data, and so on; and /backup, the normally accessible partition that stores backup directories.
When you’ve configured the network information, the rest of the process consists of adding shares and authorized users for NFS (for Unix and Linux) and CIFS (for Windows). If you run Windows in workgroup mode, you must add Windows host names. Backups then proceed as if the DD460 were any other NAS device.
Given the extremely high compression ratios the DD460 achieves, especially with data largely similar to previously backed up data, administrators may want to or need to change their backup strategies. They may want to replace incremental backups with full ones because full backups require less storage space.
Shops running the DD460 can’t pre-compress data with tools such as gzip before performing a backup, because the pre-compression process randomizes the data stream. However, removing the pre-compression step ultimately simplifies and speeds backup and restore processes: Files no longer must be zipped and unzipped, the data streams are more compressible, and there’s less of a load on the network because the DD460 compresses in real time.