Intel Vision 2022: Habana Gaudi2 AI training processor arrives to challenge the Nvidia A100

Tom Li

3 years ago

A wafer contains new 3rd Gen Intel Xeon Scalable processors. Int

At Intel Vision 2022, Intel unveiled the Gaudi2 AI training processor fresh from its Habana Labs, the Israel-based AI accelerator chip company Intel acquired in 2019.

The two chips are purpose-built for AI deep learning applications, using Intel’s 7nm process node–a significant leap from the 16nm node used for the first generation. Gaudi2 features 24 Tensor processing cores, three times more than the first Gaudi, for larger deep learning workloads. Additionally, Gaudi2 integrates media processing on-chip, boasts 96 GB of high-bandwidth memory (HBM2E), and 48MB of static RAM (SRAM). Intel says that these improvements bring three times the performance over the first generation.

Gaudi2 also comes with 24 100 Gigabit RDMA over Converged Ethernet (RoCE) ports to increase the training bandwidth on Gaudi2. Because it uses Ethernet instead of a proprietary connector, it accepts a wide range of Ethernet-related networking equipment, allowing customers to easily scale their clusters without high costs.

Intel claims Gaudi2 offers twice the performance of the Nvidia A100 in common AI workloads. Source: Habana

Intel also says that Gaudi2 leads against its competitors. The company backed that claim by pitting it against Nvidia’s A100-80GB GPU in a ResNet50 and BERT showdown. ResNet50 is a neural network commonly used for image recognition, while BERT is a neural network model used for natural language processing. Both represent prevalent use cases in modern AI applications. In its scenarios, Intel claims that Gaudi2 offers 1.9 times the throughput as the A100-80GB in ResNet 50 and is twice as fast in BERT. Moreover, Gaudi2 is roughly the same die size as the A100, noted Habana Labs’ chief operating officer Eitan Medina in the press release.

For customers, Habana Labs says that Gaudi2 offers a “high-performance deep learning training processor alternative for computer vision workload” and is suitable for object detection in autonomous vehicles, medical imaging, and defect detection in manufacturing. Additionally, it can accelerate natural language processing for subject matter analysis to prevent identity fraud in insurance claims and grant submissions.

These features, however, somewhat overlap with Intel’s upcoming Ponte Vecchio, a tile-based high-performance compute (HPC) AI accelerator built using Intel’s Xe graphics core. Ponte Vecchio is also set up to be a rival to Nvidia’s upcoming H100 GPUs set to arrive in Q3, 2022. Raja Koduri, general manager of Intel’s Accelerated Computing Systems and Graphics (AXG) Group, explained the difference between Gaudi2 and Ponte Vecchio during a meeting with press.

The Ponte Vecchio AI accelerator (bottom is delidded view). Photo by Tom Li

“The AI workflow right now is getting more and more prevalent,” Koduri told the publication. “If you need ultimate programmability, flexibility and everything that you need to do, that today only works on a CPU…a ton of machine learning is still just on CPUs. And then when you need the next level of programmability, GPUs are really good at the programming model.”

Kodura’s remarks suggested that Ponte Vecchio is a dedicated accelerator based on Intel’s Xe graphics architecture and is great for deep learning training but is less programmable, while Gaudi2 is a product that performs more general AI compute workloads and is more programmable, giving developers more control over their applications.

The Gaudi2 processor is a part of Habana Labs’ HLS-Gaudi2 server. Each server has eight Gaudi2 processors and dual-socket Intel Xeon Ice Lake processors. The processor will also be available as a part of the Supermicro Gaudi2 Training server coming to market in H2 2022, as well as a turnkey variant that pairs the Supermicro Gaudi server with the DNN AI400X2 storage.

‘Arctic Sound – M’

In parallel with Gaudi2, Intel also teased a GPU for media data centres codenamed “Arctic Sound – M.” Each Arctic Sound – M card carries up to four Xe media engines, up to 32 Xe cores and ray-tracing units, hardware acceleration for the AV1 codec, and features built-in XMX AI acceleration. Intel said that a single card can support more than 30 1080p video streams or more than 40 game streams for cloud gaming. It also supports up to 62 virtualized functions for virtual desktop infrastructure workloads.

Arctic Sound – M will arrive in PCIe Gen 4 cards in Q3 2022.