IBM Corp. and Cisco Systems Inc. want to make it easier to diagnose and solve problems in an enterprise’s IT infrastructure, even to the point where it can do that by itself.
Pinpointing the causes of failures and solving the root problems takes up a lot of IT staff time, a resource that has become more scarce as budgets tighten, according to the companies, which announced a drive toward self-diagnostic and self-healing networks on Friday. The combination of networks and systems is also becoming more complex, so simplifying and automating that process is increasingly important, they said.
Self-diagnosis and self-healing are key parts of IBM’s broader autonomic computing initiative, aimed at creating systems and networks that in many respects run themselves, said Ric Telford, director of architecture and technology in the autonomic computing business of Armonk, N.Y.-based IBM. Companies can never remove the human administrator from the picture completely, but Cisco and IBM’s steps should make life easier even when people have to get involved, he said.
For example, if a transaction goes wrong, the cause might lie in any one of many applications or devices that come into play across the infrastructure, Telford said. Narrowing it down can be hard.
“The growing complexity of infrastructure is causing more and more of these hard-to-diagnose problems,” he said.
Initial aims of IBM’s and Cisco’s program include coming up with a common way for parts of the system to log events and providing software for an administrator to see and analyze problems. The two companies plan to offer these and other technologies over time, but they also are making all the technology available openly to other players and will seek to have it adopted by industry standards bodies, Telford said.
Between technical and political disagreements, it’s unlikely all vendors will sign on with the two vendors’ approach, but even having coordination among some systems should save IS departments time and money, said Amy Wohl, an industry analyst at Wohl Associates, in Narberth, Penn.
“It’s unrealistic to think that anything new is going to cover every single thing,” Wohl said.
The IBM-developed Common Base Event (CBE) specification defines a standard format for event logs, which devices and software use to keep track of transactions and other activity.
All the components of systems typically have different formats for the information they collect about events, Telford said. For example, if an IS team needs to figure out where something went wrong with an e-business application, they may need to understand 40 different event log formats, he said. Root cause analysis of the problem could require several different administrators – database, network and so on – getting involved.
As a common format, CBE can simplify that process, Telford said. Future products should use CBE as their native log format, but “log adapters” can define mappings between current proprietary log formats and CBE, he said. IBM now has a team of about 24 engineers developing log adapters for core IBM products, including hardware, software and storage products, according to the company.
In August, IBM proposed CBE as a standard to the Organization for the Advancement of Structured Information Standards (OASIS), Telford said.
Another piece of the puzzle is a log and trace analyzer, a visual tool for administrators to study log files in various views. IBM has already made a log and trace analyzer available in its WebSphere Studio application development platform, where developers can use it to work out problems before a product is deployed. For IS administrators using production versions of software, IBM probably will ship a log and trace analyzer as part of its Tivoli system management software, at an undetermined future date, Telford said.
Other things that could be standardized include correlation mechanisms – ways of associating events with one another – and filters for sifting out the events that are relevant, Telford said. Depending on who at IBM, Cisco or other companies develops methods of doing that, such methods could be proposed to OASIS or another standards body for approval, he said.
In the second phase of the initiative, San Jose-based Cisco will integrate the technologies into its products. The hope is to be able to correlate events in a router with those in middleware, for example.
With large networks involved, one challenge that lies ahead is collecting and working with event data even if one company doesn’t control the whole network or the whole transaction, Telford said.
The kinds of capabilities IBM and Cisco are pushing are likely to become pervasive, but there are sure to be competitive offerings, analyst Wohl said. Hewlett-Packard Co. and Sun Microsystems Inc. already offer some software that does things similar to IBM’s autonomic computing concept. And as is typical, political battles may loom on the path to industry standards, she said.
However, Wohl sees such technology ultimately benefiting most users, consumer as well as enterprise. Any place with a stretched IS staff or none at all should welcome the idea of a system fixing itself, she said.