Computex When Nvidia first teased its Arm-based Grace CPU back in 2021, many saw it as a threat to Intel and AMD. Four years later, the Arm-based silicon is at the heart of the GPU giant's most powerful AI systems, but it has not yet replaced x86 entirely.
At Computex on Thursday, Intel revealed three new Xeon 6 processors, including one that will serve as the host CPU in Nvidia's DGX B300 platform announced at GTC back in March.
According to Intel, each B300 system will feature a pair of its 64-core Xeon 6776P processors, which will be responsible for feeding the platform's 16 Blackwell Ultra GPUs with numbers to crunch. (In case you missed it, Nvidia is counting GPU dies as individual accelerators now.)
However, the chips announced Thursday aren't your typical Xeons. Unlike the rest of the Xeon 6 lineup, these ones were optimized specifically to babysit GPUs.
While most people largely associate generative AI with power-hungry graphics cards, there is still plenty of work for the CPU to do. Things like key-value caches — the model's short term memory — often need to be shuttled from system memory to HBM as AI sessions are fired up and time out, while workloads, such as vector databases used in retrieval augmented generation (RAG) pipelines, are commonly run on CPU cores.
The Xeon 6776P is one of three CPUs equipped with Intel's priority core turbo (PCT) and speed select technology turbo frequency (SST-TF) tech.
The idea behind these technologies is that by limiting most of the cores to their base frequencies, the remaining cores can boost higher more consistently even when the chip is fully loaded up.
Milan Mehta, a senior product planner for Intel's Xeon division, told El Reg that the tech will enable up to eight cores per socket to run at 4.6GHz — 700MHz beyond the chip's max rated turbo — while the remaining 48 cores are pinned at their base frequency of 2.3GHz.
If this sounds familiar, Intel has employed a similar strategy on its desktop chips going back to Alder Lake, which offloaded background tasks to the chips' dedicated efficiency cores, freeing up its performance cores for higher priority workloads.
But because Intel's 6700P-series chips don't have e-cores, the functionality has to be achieved through clock pinning.
"We found that having this mix of some cores high, some cores low, helps with driving data to the GPUs," Mehta said. "It's not going to make a 3x difference, but it's going to improve overall GPU utilization and overall AI inference and training performance."
While the CPUs have been optimized for AI host duty this generation, the B300 itself follows a fairly standard DGX config. Each CPU is connected to four dual-GPU Blackwell Ultra SXM modules via an equal number ConnectX-8 NICs in a sort of daisy chain arrangement.
If you're curious, here's a full rundown of the new Xeons announced this week:
6962P | 2S | 72 | 500W | 2.7 GHz | 3.6 GHz | 3.9 GHz | 4.4 GHz | 432MB | 12 | DDR5-6400 / MCRDIMM 8800 | 96 | |
6776P | 2S | 64 | 350W | 2.3 GHz | 3.6 GHz | 3.9 GHz | 4.6 GHz | 336MB | 8 | DDR5-6400 / MCRDIMM 8000 | 88 | |
6774P | 1S | 64 | 350W | 2.5 GHz | 3.6 GHz | 3.9 GHz | 4.6 GHz | 336MB | 8 | DDR5-6400 / MCRDIMM 8000 | 136 |
Nvidia's decision to tap Intel for its next-gen DGX boxen isn't all that surprising. Chipzilla's fourth and fifth-gen Xeons were used in Nvidia's DGX H100, H200, and B200 platforms. The last time Nvidia tapped AMD for its DGX systems was in 2020, when the A100 made its debut.
Nvdia isn't the only one with an affinity for Intel's CPUs, at least when it comes to AI. When AMD debuted its competitor to the H100 in late 2023, the House of Zen wasn't above strapping the chips to an Intel part if it meant beating Nvidia.
Nvidia is also sticking with x86 for its newly launched RTX Pro Servers, but since this is more of a reference design than a first party DGX offering, it'll be up to Lenovo, HPE, Dell, and others to decide whose CPUs they'd like to pair with the system's eight RTX Pro 6000 Server GPUs.
Meanwhile, back in Arm-land
With the introduction of NVLink Fusion this week, Nvidia will soon extend support to even more CPU platforms, including upcoming Arm-based server chips from Qualcomm and Fujitsu.
As we discussed last week, the tech will enable third-party CPU vendors to use the GPU giant's speedy 1.8 TBps NVLink interconnect fabric to communicate with Nvidia graphics directly. In a more surprising move, Nvidia will also support tying third-party AI accelerators to its own Grace CPUs.
- AMD puts Intel in rear view mirror with Threadripper Pro 9000 high-end desktop chips
- Nvidia opens up speedy NVLink interconnect to custom CPUs, ASICs
- Anthropic Claude 4 models a little more willing than before to blackmail some users
- Nvidia CEO Jensen Huang labels US GPU export bans 'precisely wrong' and 'a failure'
Even as Nvidia opens the door to new CPU platforms, the chip biz continues to invest in its own, homegrown Arm-based silicon.
At GTC in March, Nvidia offered the best look yet at its upcoming Vera CPU platform. Named after American astronomer Vera Rubin, the CPU is set to replace Grace next year.
The chip will feature 88 "custom Arm cores" with simultaneous multithreading pushing thread count to 176 per socket along with Nvidia's latest 1.8 TBps NVLink-C2C interconnect.
Despite the chip's higher core count, its 50W TDP suggests those cores may be stripped down to the bare minimum necessary to keep the GPUs humming along. While that might sound like a strange idea, many of these AI systems may as well be appliances that you interact with via an API.
Vera is set to debut alongside Nvidia's 288 GB Rubin GPUs next year. ®