Intel details Itanium 2 bug

Intel Corp. is working with Itanium 2 server vendors on a bug that has surfaced in the McKinley version of its Itanium processor family, an Intel spokeswoman said Monday.

The “erratum” is confined to a subset of Intel’s McKinley Itanium 2 processors, said Barbara Grimes, an Intel spokesperson. An electrical issue with the processor can cause systems to behave unpredictably or shut down, she said.

Customers can work around the problem by lowering the clock speed of their 900MHz or 1GHz Itanium 2 processors to 800MHz, Grimes said. They should contact their system vendor to determine the best course of action, she said. Intel will swap out older McKinley processors for new ones upon request, but the company is not issuing a recall, she said.

IBM Corp. will stop shipments of the recently announced x450 server, IBM’s first product with Itanium 2, according to Lisa Lanspery, an IBM spokesperson. The company has a handful of customers using the x450 so far, but hasn’t received notice of any problems with those systems, she said. IBM does not currently know when it will resume shipments, she said.

Hewlett-Packard Co., Intel’s main partner with the Itanium processor, is still shipping its Itanium systems, said Jim Dunlap, an HP spokesperson. HP co-developed the processor with Intel, and has introduced the largest number of servers based on the chip. The company is still working with Intel on the problem, and hasn’t determined exactly what is the best solution for its customers, he said.

Unisys Corp. spokesperson Stephen Holzman said the company is satisfied with the measures Intel has prescribed, and there has been no impact on its shipment schedules.

In order to crash the system, a particular sequence of operations and events need to happen in step, Grimes said. Intel discovered the problem in lab testing after a system vendor reported it earlier this year, she said.

This indicates that it isn’t a problem with the core logic, said Nathan Brookwood, principal analyst with Insight 64 in Saratoga, Calif. Since all the chips have essentially the same logic design, a problem with only a particular set of chips indicates the problem is in how the electrons travel across the chip, he said.

Details as to what types of data or configurations cause a system to shut down are unclear. The problem is not related to a particular batch of processors, or any one instruction or data stream, Grimes said. A field test can determine if a particular McKinley processor is affected by the bug, but problems can occur on systems that test clean in the field, Grimes said. Intel has developed a manufacturing test for processors coming out of its fabs that identifies the problem, she said.

Intel hasn’t found any commercially available software that can cause the problem, said Kevin Krewell, senior editor of the Microprocessor Report in San Jose.

McKinley was introduced last July. It solved many of the performance issues that held back adoption of the first Itanium product, Merced, but customers have still been reluctant to purchase systems with the chip outside of the high-performance computing market.

Ironically, this makes fixing the problem easier, Krewell said. “Since they’ve shipped so few McKinleys doing a swap-out shouldn’t be a big problem for them,” he said.

System owners will likely look to upgrade to a new processor, especially if Intel makes the new Madison processor available as part of the deal, both analysts said. Madison is expected to be released in the middle of the year and features a larger cache than the McKinley processor.

Customers might opt to lower the clock speed of their processor if they are using McKinley in a software development machine or on some mission-critical applications, Krewell said. System managers can be very conservative when it comes to critical applications, and sometimes it’s better to take a slight performance hit than to risk having that system become unstable on a new processor, he said.

The problem must occur extremely infrequently for it to show up now after the chip has been out almost a year, Krewell said. The testing and validation process for server processors is more strict than for desktop processors, and widespread reports of the problem haven’t surfaced, he said.

“There is no such thing as the perfect microprocessor. Intel handles these sorts of situations at least as well as anybody else in the industry,” Brookwood said.