FRAMINGHAM, Mass. — Cisco Systems Inc. has three words for network architects looking to grow their data centres: Faster. Flatter. Simpler.
The equipment manufacturer says its FabricPath technology embodies all three qualities by making far better use of connections between data-center switches than the venerable spanning tree protocol (STP).
In this exclusive test, we assessed FabricPath in terms of its ability to boost bandwidth, reroute around trouble, and simplify network management. In all three areas, FabricPath delivered: Cisco’s pre-standard take on the IETF’s forthcoming TRILL (Transparent Interconnection of Lots of Links) specification showed real improvement over STP-based designs.
There is a catch: The latest 1/10-gigabit Ethernet line cards for the Cisco Nexus 7000 data centre switch are the first, and so far only, products to enable FabricPath. That’s likely to change as Cisco expands FabricPath support (it’s supported in the recently announced Nexus 5500) and more vendors release TRILL implementations.
The test results, combined with the likely appearance of more TRILL solutions in the next few months, suggest a flat future for data centre network design.
In essence, FabricPath is a form of link-layer routing. FabricPath-enabled switches differ from their conventional Ethernet counterparts in two ways: They compute layer-2 paths using control messages carried over IS-IS, the routing protocol, and they encapsulate incoming Ethernet frames with a FabricPath header. This header contains routable source and destination switch addresses and a time-to-live field for loop prevention.
FabricPath involves minimal additional configuration, and doesn’t require knowledge of IS-IS. It takes two lines to enable FabricPath in the switch configuration; the only other mandatory requirement is to distinguish fabric ports from edge ports (a single command on each fabric-facing interface). In testing we also used optional commands for assigning switch IDs and setting the hashing algorithm used by traffic flows; these required one line apiece.
FabricPath’s biggest advantages over STP are in the areas of bandwidth and design versatility. STP provides redundancy and prevents loops, but it uses an active/standby model to do so. As a result, STP networks offer two potential paths for any flow, only one of which can forward traffic. Spanning tree designs further constrain bandwidth and increase complexity by requiring routers to move traffic between broadcast domains. The routers in turn add latency and require additional connections for redundancy.
In contrast, FabricPath creates a single switch fabric across all participating switches, increasing available bandwidth within a single layer-2 domain. FabricPath uses equal-cost multipath (ECMP) routing to distribute traffic across all available links, making it an active/active technology.
An expanded layer-2 domain also requires fewer layer-3 routing hops, reducing latency, and makes it simpler to migrate services such as VMware’s Vmotion. And a larger layer-2 domain simplifies change management, since moving attached hosts no longer requires IP address and/or VLAN configuration changes.
FabricPath also reduces broadcast flooding and media access control (MAC) address table size, both well-known issues with large layer-2 networks. FabricPath switches use multicast rather than flooding to forward frames sent to unknown destinations, and compute routing tables with information learned from the fabric and source MAC addresses learned on each edge switch.
Moreover, using an approach called “conversational learning,” switches populate MAC address tables only for ports actually involved in conversations. This differs from conventional switching, where switches see all flooded traffic within a broadcast domain and put every address into their MAC tables. In contrast, FabricPath switches don’t need gargantuan MAC address tables, even when layer-2 domains encompass tens of thousands of hosts.
Cisco claims FabricPath scales up to 256 active paths, with each path comprising up to 16 links using link aggregation (or EtherChannel, in Cisco’s terminology). We did not verify that claim; doing so would require nearly 10,000 test ports. However, we did verify two building blocks used to make this claim — the ability of one switch to support 16 concurrently active paths, and the ability to support up to 16 links per path.
The biggest drawback with FabricPath is its limited availability: At test time in September, it worked only with Cisco Nexus 7000 switches, and only when equipped with F1 32-port 1/10-gigabit Ethernet line cards. Cisco says the recently announced Nexus 5500 will support FabricPath in a 2011 software release. Beyond that, it isn’t yet possible to extend FabricPath to other Cisco switches, let alone those from other vendors.
Also, while FabricPath did extremely well in this first look, it’s hardly the only the only metric when it comes assessing data center switching. Because we focused exclusively on FabricPath functionality in this test, we didn’t get into other key concerns such as scalability and latency. We’ve noted these concerns switches in past tests, and plan to return to these areas in an upcoming Nexus assessment involving an even larger test bed. We also plan to test other TRILL implementations as they become available.
Another issue is pricing, though that’s not necessarily a negative. At first blush, the US$1.5 million list price of the test bed is a jaw-dropper. But that covers a very large setup (six Nexus 7010 chassis and 384 10G Ethernet ports with optics) to showcase scalability. Existing Nexus 7000 customers can add F1 FabricPath-capable line cards for US$35,000 apiece (of course, few if any buyers pay list price).
Our tests examined FabricPath functionality in five ways. All these involved six Nexus 7010 chassis linked to create one FabricPath network connecting 12,800 emulated hosts. Spirent Communications TestCenter traffic generator/analyzers emulated 100 hosts per port, offering traffic to 128 10G Ethernet ports on two of the six switches.
In the first test, we sought to validate that FabricPath would support 16 redundant paths between switches. After configuring both edge switches with 16 EtherChannel interfaces, each with four links, we used Spirent TestCenter to send traffic between all emulated hosts for five minutes.
In this test, the switches forwarded all traffic with zero frame loss, validating FabricPath’s ability to load-share across 16 redundant connections.
While the number of EtherChannel groups is important, so too is the number of links in each group. Applications involving massive data transfers between hosts need fat pipes, so that was the focus here. We used the same physical topology as in the first test, but this time configured each edge switch with four EtherChannels, each comprising 16 10G Ethernet links. Again, the system delivered all frames without loss.
In the same test we also examined how fairly FabricPath dealt with host MAC addresses. We’ve seen issues in previous tests where switches’ hashing algorithms caused very uneven distribution of test traffic across multiple links. Thanks to FabricPath’s use of ECMP, we saw variation of 0.07 per cent or less across EtherChannels. Then we repeated the test using a completely different set of pseudorandom MAC addresses, and obtained virtually identical results. This validates the “equal” in ECMP; the switches should be able to hash any pattern of MAC addresses and distribute them uniformly across all core links.
No multicast performance penalty
Cisco also claims FabricPath load-shares multicast source-receiver trees across multiple spine switches, compared with the single tree formed in STP networks. We put that claim to the test with a very large multicast setup. Spirent TestCenter represented 100 multicast sources on each port on both edge switches (128 ports total); each port on each edge switch also joined 50 groups sourced from all ports on the other edge switch. That’s the layer-2 equivalent of 640,000 multicast routes (128 edge ports times 50 groups times 100 sources per group).
To determine whether FabricPath would load-share multicast traffic, we examined packet counts for each EtherChannel interface after each test. There was more variation than with unicast traffic, but only slightly; at most, packet counts differed by around 2.5 per cent across the various EtherChannels.
We then repeated the same test with a combination of unicast and multicast traffic; again, FabricPath distributed all frames more or less uniformly across links between switches. In fact, variances in unicast and multicast packet counts were exactly the same in mixed-class testing as if we offered only unicast or only multicast. This suggests that adding multicast to a FabricPath network carrying unicast (or vice-versa) won’t have an adverse effect on load-sharing.
Fast Fabric failover
For networking in general and data centres in particular, resiliency is an even more important consideration than high performance. With both FabricPath and STP, the key issue is how quickly the network reroutes traffic around a failed link or switch. It takes around 1 to 3 seconds for a network to converge after a failure with rapid spanning tree, or 45 to 60 seconds with standard spanning tree. The natural follow-on question, then, is how fast FabricPath converges after a failure.
To find out, we offered the same traffic as in the four-path, 16-link test described above, but killed power to one of the spine switches and derived failover time from frame loss. We repeated the test four times, powering off the spine switches one at a time.
The results show FabricPath converges far faster than spanning tree. On average, the system rerouted traffic sent to a failed spine switch in 162 milliseconds, a big improvement over rapid spanning tree’s 1 to 3 second convergence time.
We also tested convergence time when adding a switch to the FabricPath network by powering up each downed spine switch one at a time. In this test, convergence time was zero. The IS-IS protocol recognized each new path and began routing traffic over it, but the system dropped no frames during or after each route recalculation.
Data center Network manager
Our final set of tests examined the ability of Cisco’s Data Center Network Manager (DCNM) software to configure and monitor FabricPath networks. DCNM uses Simple Object Access Protocol (SOAP), an XML-based method of representing data, which allows it to be called by any third-party Web services application. Cisco demonstrated this with functional tests of XML input and output to a DCNM server.
In our tests, we focused on DCNM’s ability to perform common FabricPath management tasks. All the tasks tested are included with the base version of DCNM, supplied free for managing Nexus switches. Some additional functions such as configuration history management are available at extra cost but we did not test these. Also, DCNM is mainly useful for Nexus switch management; while it can discover non-Nexus switches using Cisco Discovery Protocol (CDP), the information it manages is limited to that supplied by CDP. With Nexus devices, the management toolkit is a lot more extensive.
In our first test, we configured DCNM to discover the six Nexus switches and populate its database. Second, we configured DCNM to send text and e-mails when traffic on a FabricPath link exceeded 80% utilization. Third, we configured DCNM to display an alarm on link failure (triggered by physically removing a cable between edge and spine switches). Finally, we configured DCNM to apply weighted random early detection queuing to all switch configurations, and then to remove the WRED section of all switches’ configurations. DCNM successfully accomplished all these tasks.
While we would like to see FabricPath implemented on more than one switch, there’s no question it represents a significant advancement in the state of the networking art. As these tests showed, FabricPath simplified network design while improving scalability and resiliency. For network architects looking to expand their data centers, flattening the network with FabricPath is now a real option.
(Newman is a member of the Network World Lab Alliance and president of Network Test, an independent test lab and engineering services consultancy. He can be reached at email@example.com.)
(From Network World U.S.)