Many IT departments are a mishmash of disparate systems cobbled together over the years, brought in through mergers and acquisitions or bought as needed, with no overall plan in mind. While such diversity has its benefits – non-essential data certainly doesn’t need to reside on a we’re-dead-in-the-water-if-we-lose-this business critical system – it does come with its share of headaches for administrators.
Well aware of this, vendors are at the ready with promises of quick solutions. A couple of years ago, the proposed panacea for all storage headaches was storage virtualization – an abstraction layer that gives users a single view of their storage regardless of its physical location. To the administrator, it can appear as if all of a company’s storage systems reside
together on one system. Everyone had a “virtualization” solution that would solve the problems of storage administrators trying to cope with heterogeneous environments.
The ability to view a heterogeneous environment as a single entity would theoretically reduce maintenance costs because administrators could manage all of their systems as if they were one. The problem, say analysts, was there were so many definitions of “virtualization” that the term quickly became meaningless. The sheer number of ways to implement such a layer – whether to host it on a server or the network, for instance – didn’t help.
“Virtualization” became the hot buzzword and vendors wanting to benefit from its popularity were describing all types of offerings as virtualization, whether they fit the bill or not. Because so many storage providers were using this ill-defined term, customers were left in a state of confusion. Today, the word has become so overused that even vendors offering actual storage virtualization solutions avoid it.
All of this is happening, said Nancy Marrone, a senior analyst with the Enterprise Storage Group (ESG) in Milford, Mass., just as virtualization technology is maturing. It is
no longer touted as a cure-all solution, but one that will
enable users to perform storage services, such as replication, that might once have been too costly to consider.
At its inception FalconStor Software was a self-described virtualization vendor, a term it now tries to distance itself from, said Mira Sharma, the company’s Toronto-based country manager. “FalconStor has been shipping products for about a year-and-a-half, and we’ve come to the conclusion that we don’t want to get caught up in this whole virtualization thing,” she said. FalconStor now describes itself as a “storage pool aggregator.”
“To us that’s what virtualization is – it’s aggregating your storage into a pool so that the servers can share the storage pool.” When a server is sharing the pool, to the server it looks like all available storage is for itself, she said. Disks appear to an application as if they are locally attached to the application rather than as a part of a larger pool available to other applications as well.
Virtualization is an abstraction layer that makes it easier for end users to manage their storage assets – it creates a virtual pool of storage that’s independent of physical location, said Anders Lofgren, a senior analyst at Giga Information Group in Cambridge, Mass. He adds, “I think the word virtualization.is pretty much useless at this point.” Giga recommends that customers hold off on making strategic decisions on virtualization until the second half of 2003 when the strategies of larger vendors such as Sun, HP, IBM, Hitachi and EMC will become clearer.
But, like Marrone, he too thinks the technology has a role to play in the market place. “People are using it on a smaller scale. As much as I sound like a big naysayer, a doomsday person, I think there are tactical reasons why virtualization should be implemented,” he said. He just doesn’t believe that virtualization should be a major part of the storage strategy today, as it isn’t mature enough yet.
One example of a storage function made cheaper by virtualization is replication for the purposes of disaster recovery, Marrone said. Traditionally, companies had to replicate from like disk to like disk, meaning that data residing on an EMC Symmetrix box had to be replicated to another Symmetrix box.
“That is extremely cost prohibitive, and so a lot of disaster recovery plans weren’t necessarily in place.” Virtualization can mask the fact that the company isn’t replicating to another Symmetrix box thereby letting it replicate to a less-expensive one.
Virtualization is part of a strategy that Brian Tao is counting on to simplify AT&T Canada’s storage problems and reduce costs.
The company grew as Internet service providers joined the fold, each bringing with it its own set of storage solutions, none of which really worked well together despite vendor claims, said Tao, the Toronto-based senior manager of Internet services infrastructure. AT&T wanted to take its multi-vendor system – it had Sun, EMC, IBM and Network Appliance (Net App) products – and consolidate them into an enterprise-wide, all-encompassing, one- or two-vendor solution. It also wanted to consolidate the number of touch points administrators had with the storage solution.
The goal was to take all of the storage islands and consolidate them into four or five discrete domains. Currently, each application has its own domain – there is one for Oracle, external customer mail, internal corporate e-mail, etc. Though the company probably has about 30TB of data – a 100GB here, 50GB there – they are scattered into islands. Consolidation will allow AT&T to save on maintenance contracts and increase reliability, Tao said.
The company decided to make Net App a major part of its strategy and is moving its storage onto Net App’s higher-density F800 server series, which allows AT&T to get more on a shelf. The company is also using Net App’s virtualization software, which will mean storage is no longer tied to its physical location. This makes it easier, for instance, to allot temporary storage space as needed. So if DBAs need 500GB of disk space to do a billing run test, it will no longer be a time-consuming, arduous process to create the space for them. “(We) no longer have to set aside time to make sure storage is available. It’s like a utility. Turn on the tap, the water is always there,” Tao said.
He hopes once the solution is up and running, it will reduce operational costs by as much as 50 per cent.
Tower of babel
Part of the problem with virtualization technology is that there are so many different approaches. There’s array-based, host-based and network-based virtualization. To further complicate matters, there are two approaches to network-based virtualization – in band and out of band.
Array-based virtualization means that the virtualization function lies only within a single storage array. Val Bercovici, the chief technical architect at Network Appliance Canada in Ottawa, which offers array-based virtualization, said it allows users to get away from the strict rules that exist in the storage business with respect to how you present disks to servers – and ultimately to users. And, according to Bercovici, it manages to add a layer of abstraction without adding the complexity that accompanies network-based solutions.
The disadvantage, said Paul Giroux, Sun Microsystems Inc.’s Toronto-based senior director of global network storage sales and technology, is that the virtualization features are only available for one vendor’s solution, thereby requiring a homogenous pool of storage. This means customers can have only one price point, Giroux said. It’s “you can have any colour you want as long as it’s black,” he said.
Lofgren agrees that array-based virtualization has its limitations since it doesn’t give customers the advantage of creating pools of storage across multiple physical subsystems. “I don’t think that the advantage is all that great,” he said. “There’s some things you can do in terms of moving data around and perhaps migrating data and doing things in the background. But certainly the benefits aren’t of the scale you would see if you could do it on a network level.”
In host-based virtualization, the virtualization function runs in the server. It’ll allow the server to see various storage solutions as one. The problem here, Giroux said, is that most companies have more than one type of server and they would need virtualization software for each of the hosts they support. So, host-based virtualization means one server but multiple storage solutions, while array-based virtualization means multiple servers but one storage solution.
Network-based virtualization allows users to have heterogeneous servers and storage solutions, Giroux said. Sun will soon release its network-based product.
With in-band virtualization, an appliance sits in the data path between the servers and various storage solutions,
allowing them to appear as one, if that’s what is desired. The obvious disadvantage here is that the box could act as a bottleneck, Lofgren said, though vendors will, of course deny it.
You have to configure a fairly high-performance solution to prevent choke points, Net App’s Bercovici said.
Sharma at FalconStor, which offers an in-band virtualization solution (though it no longer describes it as such), said there is no bottleneck created because the data passes through at the speed of the bus on the network.
With out-of-band virtualization, nothing sits in the data path. Instead, agents sit on the server to keep track of where data is residing. But customers are concerned about how much CPU utilization those agents represent on the host, Lofgren said. Plus, those agents also need to be managed.
one view to bind them all?
At one point, the goal of virtualization was to allow companies to create one view of all of their storage, but ESG’s Marrone said this isn’t a very desirable aim for most companies. “What’s the point. I just don’t see that people are going to want to treat all of their assets as a singular asset.”
Though vendors tend to disagree with Marrone, AT&T’s Tao doesn’t ultimately want one data pool – whether physical or virtual. Sometimes, the storage just has to be close to the server to prevent network latency.
“On the physical side, I don’t think we’ll ever get to that single consolidated pool of storage,” he said.
Even creating a single virtual view of all of AT&T’s data isn’t desirable, he said. Though the technology may enable it, there’s an advantage to artificially limiting it – you may not want HR to steal all your disk space, for instance, he said.