Nvidia is under the mask of the best supercomputers in the world; Intel and AMD are talking about potential products at the Supercomputing Conference.
Nearly 70% of the 500 fastest supercomputers globally, as announced at the Supercomputing 20 conference this week, are powered by Nvidia, including eight of the top 10.
Among them was Selene, which was designed by Nvidia and debuted at number 5 on the semi-annual TOP500 list of the fastest computers. With top-of-the-line systems requiring 10,000 or more CPUs and GPUs, they are incredibly costly, and most of them are owned by the government or research organizations.
This makes Selene all the rarer. It was founded and is located in Santa Clara, Nvidia, California, headquarters. (It is commonly believed that many supercomputers in the private sector are not mentioned for competitive reasons)
Nvidia’s Big Showing
It is also significant that another Nvidia supercomputer, the DGX SuperPOD, has taken the top spot on the GREEN500 list, which tests the energy efficiency of the TOP500 systems. Four of the top five systems had Nvidia's A100 Ampere GPU. Fujitsu's Fugaku prototype, with only Arm processors and no DRAM, dropped from first to sixth.
This is big because GPUs have never been recognized for energy efficiency, but now Nvidia has a new story to tell: success and energy efficiency in a single product.
Nvidia also launched its Mellanox NDR 400Gbps InfiniBand family of interconnected devices, available in Q2 of 2021. The new lineup includes adapters, data processing units (DPUs), which Nvidia calls smart NICs, switches, and cables.
This isn't just a doubling of the bandwidth per port. Mellanox triples the number of ports in a single system, which potentially enables one switch platform to access the entire data center. Mellanox said NDR 400 Gbps InfiniBand would see network savings of 1.4x and power savings of up to 1.6x for data centers.
AMD Claws Back
Good news and bad news about AMD. Its share of the top supercomputers that use its CPUs almost doubled from 11 on the June TOP500 list to 21 on the current list. The growth came from new systems with second-generation EPYC processors, which come with a crazy 64-core configuration.
On the downside, there can be no momentum against Nvidia on the side of the GPU. Just one of the top 500 used AMD Radeon GPUs. Also, Intel's Xeon Phi, which has been discontinued, was better off with three systems on the list.
But the AMD doesn't give up. On Monday, it unveiled its latest Instinct MI100 server GPU, calling it the world's fastest HPC accelerator for scientific research, with more than 10TFLOPs for dual-precision floating-point output. AMD claims it increases the half-precision floating-point efficiency of AI training workloads nearly seven times over the previous generation of accelerators.
MI100 comes with a technology called Matrix Core, part of AMD's new CDNA architecture designed for HPC and machine learning workloads. Future architecture iterations will be used for the next generation of Instinct GPUs.
Intel’s Latest Try at GPUs
Intel hopes that the charm of the GPUs will be the third time. It employed Raja Koudri, the designer of the AMD Radeon GPU, to be its chief architect this time around, so it has no reason for technological failure.
Its new GPU is called the Xe, showing that Intel has the worst product branding department in Silicon Valley. The best news about Xe was the launch of oneAPI Gold, the first version of Intel's Xe GPU programming platform.
OneAPI Gold is part of Intel's XPU strategy for heterogeneous production. The servers are a lot more than x86 chips. They have GPUs, FPGAs, AI accelerators, and network processors, and Intel has products in all categories. OneAPI Gold will monitor all of them, enabling developers to write a collection of highly optimized code and run it optimally on any processor.
Intel supports oneAPI as an open standard, but it's designed for Intel architecture. So I'm not going to hold my breath for AMD or Nvidia to embrace it any time soon. But it could do what CUDA did for Nvidia for everyone all-in with Intel.
Xe processors are still running, with a high-end version, codenamed Ponte Vecchio, expected next year. OneAPI Gold is expected to ship next month.