Hospital sheds light on net problems

The biggest problem facing network managers at Children’s Hospital was that they were often in the dark — literally and figuratively.

Scattered throughout Boston, the hospital’s 22 campus buildings lose power frequently resulting from changes the city makes to the power grid. Many times, network managers at Children’s were unaware the power in another building on the network had failed until after the fact.

“Power is our No. 1 source of downtime,” said Jim Hutchinson, Children’s network manager. “Our power grid is always being manipulated, and we’re not always privy to when power changes have been made.”

Worse, network managers often were unaware when batteries were low or dead in the uninterruptible power supplies (UPSs), which the hospital had installed on hubs and other key network equipment to guard against power failures.

Because of the battery problem, when power went out, the network took longer to recover. And managers didn’t know there was a problem until users started calling the help desk.

Earlier this year, Hutchinson decided to take a proactive stance. To avoid further problems and reduce downtime, Hutchinson focused on two key areas: he wanted to know when a UPS was in danger of failing, and he wanted to watch link utilization.

The hospital called on Predictive Systems, a consulting firm in New York, for help. Predictive installed Seagate’s NerveCenter, software that collects information about network problems in a central location, lists which users and devices will be affected by those events, and highlights events that could be the root of problems.

The company configured the UPSs to send SNMP messages to NerveCenter when power fails or their batteries are low, said Steve Mastrorilli, regional vice-president at Predictive.

When utilization on a link becomes high — say, above 85 per cent — network managers are notified by e-mail. When a power failure or other critical event happens, network managers are notified by an automatic paging system from Telamon called TelAlert, which Predictive integrated with NerveCenter.

The whole package cost about US$40,000, which included consulting and software. In addition to TelAlert and NerveCenter, the software includes the Windows NT Server platform on which they run.

“I’m not a big fan of enterprise management platforms,” Hutchinson said. In the past, he had tried platforms such as Hewlett-Packard’s OpenView and IBM’s NetView. But he didn’t think they provided enough features for the money.

“NerveCenter isn’t trying to be everything to everybody,” he added, noting that the software focuses on correlating network events and basic polling, and that’s more than enough.

By correlating the events it receives from the network, NerveCenter can point out events that are all symptoms of the same problem. It now takes half as much time to resolve problems as it did without network management software, Hutchinson said.

Monitoring utilization lets the managers head off user complaints. If link utilization is high, it’s often caused by a bad network card or a large image transfer, Predictive’s Mastrorilli added.

Managers can start to track down the source of problems and possibly warn users that they can expect a slowdown in the network until a problem is resolved. Uptime is crucial at Children’s. The metropolitan-area network connecting the hospital buildings uses FDDI for redundancy, with Digital GIGAswitch FDDI switches forming the backbone. Desktop systems connect to the backbone through 10/100Mbps Ethernet switches.

In the future, the network will have to handle desktop video and other imaging applications. Hutchinson said that will prompt the hospital to look at Layer 3 switches to handle that traffic.

In any case, network managers at Children’s are now able to shed a little light on what’s really going on in the network. “We don’t miss anything any more,” Hutchinson said.

— IDG News Service