Data centre fabrics catching on slowly

FRAMINGHAM, Mass.– When the U.S.-based Government Employees Health Association (GEHA) overhauled its data center to implement a fabric infrastructure, the process was “really straightforward,” unlike that for many IT projects, says Brenden Bryan, senior manager of enterprise architecture.

“We haven’t had any ‘gotchas’ or heartburn, with me looking back and saying ‘I wish I made that decision differently.'”

GEHA, based in Kansas City, Mo., and the nation’s second largest health plan and dental plan, processes claims for more than a million federal employees, retirees and their families. The main motivator behind switching to a fabric, Bryan says, was to simplify and consolidate and move away from a legacy Fibre Channel SAN environment.

When he started working at GEHA in August 2010, Bryan says he inherited an infrastructure that was fairly typical: a patchwork of components from different vendors with multiple points of failure. The association also wanted to virtualize its mainframe environment and turn it into a distributed architecture. “We needed an infrastructure in place that was redundant and highly available,” explains Bryan. Once the new infrastructure was in place and stable, the plan was to then move all of GEHA’s Tier 2 and Tier 3 apps to it and then, lastly, move the Tier 1 claims processing system.

GEHA deployed Ethernet switches and routers from Brocade, and now, more than a year after the six-month project was completed, he says they have a high-speed environment and a 20-to-1 ratio of virtual machines to blade hardware.

“I can keep the number of physical servers I have to buy to a minimum and get more utilization out of them,” says Bryan. “It enables me to drive the efficiencies out of my storage as well as my computing.”

Implementing a data centre fabric does require some planning, however. It means having to upgrade and replace old switches with new switching gear because of the different traffic configuration used in fabrics, explains Zeus Kerravala, principal analyst at ZK Research. “Then you have to re-architect your network and reconnect servers.”

A data centre fabric is a flatter, simpler network that’s optimized for horizontal traffic flows, compared with traditional networks, which are designed more for client/server setups that send traffic from the server to the core of the network and back out, Kerravala explains.

In a fabric model, the traffic moves horizontally across the network and virtual machine, “so it’s more a concept of server-to-server connectivity.” Fabrics are flatter and have no more than two tiers, versus legacy networks, which have three or more tiers, he says. Storage networks have been designed this way for years, says Kerravala, and now data networks need to migrate this way.

Look at it as an evolution in the architectural landscape of the data center network. Bob Laliberte, senior analyst at Enterprise Strategy Group.

One factor driving the move to fabrics is that about half of all enterprise data center workloads in Fortune 2000 companies are virtualized, and when companies get to that point, they start seeing the need to reconfigure how their servers communicate with one another and with the network.

“We look at it as an evolution in the architectural landscape of the data center network,” says  Laliberte. “What’s driving this is more server-to-server connectivity … there are all these different pieces that need to talk to each other and go out to the core and back to communicate, and that adds a lot of processing and latency.”

Virtualization adds another layer of complexity, he says, because it means dynamically moving things around, “so network vendors have been striving to simplify these complex environments.”

As home foreclosures spiked in 2006, Walz Group, which handles document management, fulfillment and regulatory compliance services across multiple industries, found its data centre couldn’t scale effectively to take on the additional growth required to serve its clients. “IT was impeding the business growth,” says Chief Information Security Officer Bart Falzarano.

The company hired additional in-house IT personnel to deal with disparate systems and management, as well as build new servers, extend the network and add disaster recovery services, says Falzarano. “But it was difficult to manage the technology footprint, especially as we tried to move to a virtual environment,” he says. The company also had some applications that couldn’t be virtualized that would have to be managed differently. “There were different touch points in systems, storage and network. We were becoming counterproductive.”

To reduce the complexity, in 2009 Walz Group deployed Cisco’s Unified Data Center platform, a unified data center fabric architecture that combines compute, storage, network and management into a platform designed to automate IT as a service, across physical and virtual environments. The platform is connected to a NetApp SAN Storage Flexpod platform.

Previously, when they were using Hewlett-Packard Co. technology, Falzarano recalls, one of their database nodes went down, which required getting the vendor on the phone and eventually taking out three of the four CPUs and going through a troubleshooting process that took four hours. By the time they got the part they needed, installed it and returned to normal operations, 14 hours had passed, says Falzarano.

Walz Group found its data center couldn’t scale to grow as quickly as the business needed to serve clients, says Chief Information Security Officer Bart Falzarano. But there’s been a dramatic change after the company installed a fabric.

“Now, for the same [type of failure], if we get a degraded blade server node, we un-associate that SQL application and re-associate the SQL app in about four minutes. And you can do the same for a hypervisor,” he says.

IT has been tracking the data center performance and benchmarking some of the key metrics, and Falzarano reports that they immediately saw a poor-density reduction of 8 to 1, meaning less cabling complexity and fewer required cables. Where IT previously saw a low virtualization efficiency of 4 to 1 with the earlier technology, Falzarano says that’s now greater than 15 to 1, and the team can virtualize apps that it couldn’t before.

Other findings include a rack reduction of greater than 50 per cent due to the amount of virtualization the IT team was able to achieve; more centralized systems management — now one IT engineer handles 50 systems — and what Falzarano refers to as “system mean time before failure.”

“We were experiencing a large amount of hardware failures with our past technology; one to two failures every 30 days across our multiple data centers. Now we are experiencing less than one failure per year,” he says.

IT team leaders at GEHA believed that deploying a fabric model would not only meet the business requirements, but also reduce complexity, cost and staff needed to manage the data center. Bryan says the association also gained economies of scale by having a staff of two people who can manage an all-Ethernet environment, as opposed to needing additional personnel who are familiar with Fibre Channel.

“We didn’t have anyone on our team who was an expert in Fibre Channel, and the only way to achieve getting the claims processing system to be redundant and highly available was to leverage the Ethernet fabric expertise, which we had on staff,” he says.

Bryan says the association has been able to trim “probably a half million dollars of capital off the budget” since it didn’t have to purchase any Fibre Channel switching, and a quarter of a million dollars in operating expenses since it didn’t need staff to manage Fibre Channel. “Since collapsing everything to an Ethernet fabric, I was able to eliminate a whole stack of equipment,” says Bryan.

GEHA used a local managed services provider to help with setting up some of the more complex pieces of the architecture. “But from the time we unpacked the boxes to the time the environment was running was two days,” says Bryan. “It was very straightforward.”

IT has now utilized the fabric for its backup environment with software from CommVault. Bryan says the association is seeing performance of about a terabyte an hour of throughput on the network, “which is probably eight to 10 times greater than before” the fabric was in place.

Today, all of GEHA’s production traffic is on the fabric, and Bryan says he couldn’t be more pleased with the infrastructure. He says scaling it out is not an issue, and is one of the major advantages with converged fabric and speed. GEHA is also able to run a very dense workload of virtual machines on a single blade, he says. “Instead of having to spend a lot of money on a lot of blades, you can increase the ROI on those blades without sacrificing performance,” says Bryan.

Laliberte says he sees a long life ahead for data center fabrics, noting that this type of architecture “is just getting started. If you think about complexity and size, and you have thousands of servers in your environment and thousands of switches, any kind of architecture change isn’t done lightly and takes time to evolve.”

Just as it took time for a three-tier architecture to evolve, it will take time for three tier to get broken down to two tier, he says, adding that flat fabric is the next logical step. “These things get announced and are available, but it still takes years to get widespread deployments,” says Laliberte.

(From Computerworld U.S. Esther Shein is a freelance writer and editor.)