When companies make duplicate copies of enterprise data for backup, disaster recovery or other business purposes, they are said to be replicating data.
Such duplicate copies of data can reside locally on the same system or network segment, or they can be placed in remote locations.
Replication can take place at the application level or the storage level. Application replication takes place at the transaction level: Each transaction is captured and duplicated on multiple systems. Storage replication involves copying the data that sits under the application.
Organizations replicate and mirror data for a variety of reasons. Since Sept. 11, a major driver for data replication has been disaster recovery and business continuity planning. Companies are hoping to bolster their capabilities in these areas by maintaining copies of data and applications at one or more off-site locations.
Corporations also replicate data to enable wider and quicker access to information across the enterprise. It’s quicker to access copies of data stored on local servers than it is to access data stored on a remote server.
Similarly, data is sometimes copied and stored at multiple locations to let multiple business units access it for their individual needs, such as data mining. Development and testing work is also less risky and disruptive when done on a copy rather than on live production data.
“There are a myriad uses for data,” says John Young, an analyst at D.H. Brown Associates Inc. in Port Chester, N.Y. “There are more people [than ever before] wanting access to and using data within a business. When you combine that with the standard requirement to back up and store data, it’s easy to see what’s driving data replication.”
There are a variety of methods with which to replicate data from a primary source to secondary sites. The choice depends on the level of protection a company’s applications require or the business needs driving the replication effort, says Dianne McAdam, an analyst at Illuminata Inc. in Nashua, N.H.
A financial services company, for instance, is far more likely to need real-time replication than a manufacturing operation, she says. Factors such as cost, complexity and performance impact also affect the choice of replication method, McAdam says.
Synchronous vs. Asynchronous
Companies that require very short recovery times tend to use an approach called synchronous replication. In this method, data is duplicated in a real-time fashion on a primary system and on secondary systems. All systems are copied simultaneously.
Synchronous replication involves a process called a two-phase commit, whereby data that’s being updated on the primary server has to be duplicated on and acknowledged by the secondary sites before the next transaction proceeds. This ensures that data is identical on all copies at all times.
The goals of synchronous replication are near-zero loss of data and very quick recovery times from failures that occur at primary sites. But the two-phase commit process results in performance degradation when the distance between the primary site and secondary site is great.
Synchronous replication can also be costly because it requires high-bandwidth network connectivity.
“To be really bulletproof, you need to do synchronous replication, but most can’t afford it,” McAdam says.
Another option that’s becoming increasingly popular is asynchronous replication. Related technologies capture a copy of each completed transaction on the primary server, which is then duplicated on the secondary systems. This duplication can happen automatically whenever an update takes place, or it can be programmed to take place at predefined intervals. Replication products can also queue data and send batches of changes when network use is low.
Asynchronous replication doesn’t require as much bandwidth as the synchronous approach and can be applied over greater distances with little performance degradation. It’s also cheaper, but it doesn’t offer the same real-time recovery capabilities.
Companies may want to use a combination of both approaches to overcome technical issues, Young says. For instance, a company may decide to stick a replication technology midway between two endpoints. Synchronous replication is then performed in hops first between the original and the midpoint, then from the midpoint to the endpoint.
Hard or Soft?
Vendors today offer both hardware- and software-based replication. Companies such as EMC Corp. in Hopkinton, Mass.; Hitachi Data Systems Corp. in Santa Clara, Calif.; and IBM offer hardware technologies, while others such as Veritas Software Corp. and Sun Microsystems Inc. enable software replication.
With hardware replication, all the duplication tasks are carried out by specialized controllers, leaving the server free for other tasks. But controllers from one vendor generally don’t work well with controllers from other vendors, so hardware replication tends to tie users to a single vendor.
There are no such limitations with software-based replication, but since server cycles are used to handle the duplication tasks, it can affect performance.
Ultimately, analysts say, the way to go depends on the user’s specific business and technology needs.
“More companies are taking replication seriously these days….Sept. 11 was a wake-up call,” Young says.