Kevin Tolly: Feature richness no longer trumped by reliability

“To have the same number of takeoffs and landings and never have my name in the paper.”

I received that well-practiced answer when I asked a commercial wide-body pilot nearing retirement what his goals had been during his nearly 30 years flying. His credo came to mind when I saw the SQL Slammer virus in the news. I thought that vendors of key IT infrastructure should have the same goals: no major crashes and staying out of the headlines.

My pilot friend understood implicitly that he was part of the transportation infrastructure and that “boring was beautiful.” Every element of the aircraft, the flight procedures and even personnel assignment were centred on maximizing reliability and thus safety. IT infrastructure vendors need to be thinking the same way.

Of course, with Microsoft Corp. as the spiritual leader of the IT software industry, that’s not likely to happen. And, despite Bill Gates’ mea culpa and his fireworks about Trustworthy Computing, he has succeeded over time in lowering the standards of what Fortune 1000 firms will accept for critical infrastructure to the point that, although it appears that he still is fighting the battle, he won the war a long time ago.

According to news reports, the security hole in Microsoft SQL Server prevented Continental Airlines from booking reservations and locked up its hub in Newark, N.J. At Bank of America, customers couldn’t access some 13,000 ATM machines on the company’s network.

Think back to the mid-’80s when systems like these were running on IBM mainframes. How many times did you see MVS, CICS, VTAM or DB2 in the news? “VTAM bug causes Bank of America ATM network to crash!” Try never.

If it did, the Gordon Bethunes (Continental’s CEO) of that era would have been on the phone with IBM’s chairman and most likely would make public statements denouncing IBM for putting their businesses at risk. A Bill Gates “Oops, I’m sorry” wouldn’t have cut it.

But these massive system failures just didn’t happen. IBM’s software infrastructure elements were not household names. They just ran. But they were built with a very different philosophy; reliability always trumped feature delivery.

I remember visiting IBM’s Networking Division briefing centre in the mid-’80s and seeing, as part of the visit, a working network running IBM’s next release of VTAM for MVS. (For those of you not old enough to remember, VTAM was IBM’s flagship network software.)

Key features that we needed appeared to be working quite well, and I was anxious to get the new version in and running. To this day, I remember the answer to my query: “Eighteen months.” I was crestfallen. This was how long it would take to complete the level of integration testing appropriate for a core infrastructure component.

Even IBM’s beta programs were well-thought-out. Only sites that had specific characteristics were invited to participate. This practice contrasts with Microsoft’s “law of large numbers” approach: throw your beta code at enough people, and they’ll likely discover most of the flaws.

Even Microsoft’s “Service Pack” approach, where hundreds of modules can be affected, is radically different from IBM’s surgical approach to maintenance for key infrastructure software. Service packs are just as likely to break as to fix.

Let’s hope this latest outbreak serves as a catalyst for an executive-level barrage on Microsoft. Maybe then, Microsoft finally would get serious about product quality.

Tolly is president of The Tolly Group, a strategic consulting and independent testing company in Manasquan, N.J. He can be reached