Cisco enhancing IOS, not starting anew

With all the discussion and announcements recently about resilient routing for service provider networks, accusing fingers have been pointed at Cisco Systems Inc. for its 15-year legacy – some say baggage – of IOS software.

The accusations run from the software’s age, to its “feature-richness” – read, excess code – its dozen-or-so release images in operation today, to its heritage in the enterprise, which bears little if any resemblance to the complexity and reliability of carrier networks. Some even say that Cisco is rewriting IOS or a new, unique operating system from the ground up specifically for carriers, to meet their requirements for modularity, protected memory operation and availability.

This new software is referred to as “IOS NG,” where NG stands for “next generation.” But Roland Acra, vice-president and general manager of Cisco’s Internet Routing group, says talk of a new operating system is overstated. “I don’t think a flash cut is the answer,” Acra said last week in an interview with Network World (US). “That work is going on in [the current iterations of] IOS. People always want continuity of features and interfaces. They value every single knob, bell and whistle.”

Cisco is doing a lot of “piecemeal” work, Acra says, to modify IOS platform software, operating system and applications exclusively for the service provider space. That said, the IOS images currently shipping to service providers have had nothing hanging on them from the enterprise, like AppleTalk or SNA features, he says.

The work currently underway on IOS includes “cleaner” interfaces, increased modularity, uptime and protection against failures, and the ability to shadow redundant processors, Acra says. Cisco is also looking to scale IOS to support “super POPs”, where heavy-duty aggregation and peering involving thousands of peers and tens of thousands of interfaces takes place.

“We’re taking what’s there and making it better,” Acra says, referring to IOS as it currently exists.

Cisco did add some resiliency features to IOS a few weeks ago under it Globally Resilient IP (GRIP) rollout. GRIP is intended to enable zero packet loss by continuing to forward traffic while a router’s route processor reconverges. But by Cisco’s own admission, GRIP is more suitable for the network edge, where there’s a small amount of topology changes. If the network topology changes while GRIP routers are reconverging and forwarding, data may be lost because it’s relying on obsolete route information.

Also, the network edge uses static routes if the customer is not multihomed. This also minimizes the effect of forwarding while topology changes occur during reconvergence, Cisco says.

GRIP also currently lacks the ability to do in-service, or “hitless” software upgrades, though it does provide the basis for this in the future. And analysts say users can only realize GRIP if they are using only Cisco routers running the same software image of IOS, which is rare in service provider networks.

Regardless, Acra says Cisco has some Tier 1 carrier customers that have already experienced several consecutive quarters of 99.999% reliability with IOS. Cisco has also made progress in reducing the number of route flaps and routing loops in IOS. Indeed, GRIP’s stateful switchover feature, which keeps Layer 2 sessions intact, is designed to mitigate just that.

“Some negative mythology is being debunked,” Acra says about IOS. “I think we’ve come a long way.”

Perhaps. But carriers should not redeploy their redundant routers anytime soon.

All the resiliency improvements to date still won’t account for all that could go wrong in IOS or any other router operating system, Acra says.

“Even if the software is rock-solid, you want to minimize your dependence [on a single system] and also protect against human error,” he says, rationalizing the need for service providers to purchase, configure, install, operate, manage and maintain a separate standalone – redundant – router for backup. “We’re certainly shooting for not having the need for it but we also want to be realistic: Software continues to be imperfect.”

That imperfection is due, in part, to the “wild ride” IP has been on in the past decade. Traditional telephony networks have had the luxury of 100 years of research and development to attain the reliability we’re all accustomed to from dialtone.

The explosive growth of IP and the Internet in just less than 10 years exposes shortcomings in the medium and invites, perhaps unfairly, apples-to-apples comparisons with the hardened architecture of the telephone network.

“I understand why there’s been more scrutiny,” Acra says. “Few industries have had zero-to-100 million people in so few years.”

Yet, that growth also reveals the flexibility of IP and the Internet, Acra says.

“We have thousands of ISPs who peer randomly,” he says. “Bandwidth is multiplying by 10 every two years. So kudos go out to the IP architecture and the Internet.”