NVIDIA’s New Ampere Data Center GPU Now Shipping to Customers Worldwide

Datacenter

The first GPU based on the NVIDIA Ampere architecture, the NVIDIA A100, is in full production and shipping to customers worldwide now. Among the first to tap into the power of NVIDIA A100 GPUs is Microsoft, which intends to take advantage of their performance and scalability.

“Microsoft trained Turing Natural Language Generation, the largest language model in the world, at scale using the current generation of NVIDIA GPUs,” said Mikhail Parakhin, corporate vice president, Microsoft. “Azure will enable training of dramatically bigger AI models using NVIDIA’s new generation of A100 GPUs to push the state of the art on language, speech, vision and multi-modality.”

“NVIDIA A100 GPU is a 20x AI performance leap and an end-to-end machine learning accelerator – from data analytics to training to inference,” said Jensen Huang, founder and CEO of NVIDIA.

New elastic computing technologies built into A100 would make it possible to bring right-sized computing power to every job. A multi-instance GPU capability allows each A100 GPU to be partitioned into as many as 7 independent instances for inferencing tasks, while 3rd-generation NVIDIA NVLink interconnect technology allows multiple A100 GPUs to operate as one giant GPU for ever larger training tasks.

The A100 draws on design breakthroughs in the NVIDIA Ampere architecture – offering the company’s largest leap in performance to date within its eight generations of GPUs – to unify AI training and inference and boost performance “by up to 20x” over its predecessors. A universal workload accelerator, the A100 is also built for data analytics, scientific computing and cloud graphics.

“The powerful trends of cloud computing and AI are driving a tectonic shift in data center designs so that what was once a sea of CPU-only servers is now GPU-accelerated computing,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA A100 GPU is a 20x AI performance leap and an end-to-end machine learning accelerator – from data analytics to training to inference. For the first time, scale-up and scale-out workloads can be accelerated on one platform. NVIDIA A100 will simultaneously boost throughput and drive down the cost of data centers.”

To sum up, the NVIDIA A100 GPU would be a technical design breakthrough fueled by five key innovations:

NVIDIA Ampere architecture – At the heart of A100 is the NVIDIA Ampere GPU architecture, which contains more than 54 billion transistors, making it the world’s largest 7-nanometer processor.
Third-generation Tensor Cores with TF32 – NVIDIA’s widely adopted Tensor Cores are now more flexible, faster and easier to use. Their expanded capabilities include new TF32 for AI, which allows for up to 20x the AI performance of FP32 precision, without any code changes. In addition, Tensor Cores now support FP64, delivering up to 2.5x more compute than the previous generation for HPC applications.
Multi-instance GPU – MIG, a new technical feature, enables a single A100 GPU to be partitioned into as many as seven separate GPUs so it can deliver varying degrees of compute for jobs of different sizes, providing optimal utilization and maximizing return on investment.
Third-generation NVIDIA NVLink – Doubles the high-speed connectivity between GPUs to provide efficient performance scaling in a server.
Structural sparsity – This new efficiency technique harnesses the inherently sparse nature of AI math to double performance.

Together, these new features would make the NVIDIA A100 ideal for diverse, demanding workloads, including AI training and inference as well as scientific simulation, conversational AI, recommender systems, genomics, high-performance data analytics, seismic modeling and financial forecasting.

Atos Supercomputer with NVIDIA A100 GPU

Atos has equipped its new BullSequana X2415, Atos’ supercomputer, with the NVIDIA A100 Tensor Core GPU. This new supercomputer blade would deliver unprecedented computing power to boost application performance for HPC and AI workloads, tackling the challenges of the exascale era.

The Atos BullSequana X2415 blade would increase computing power by more than 2* and optimize energy consumption thanks to Atos’ 100% highly efficient water-cooled patented DLC (Direct Liquid Cooling) solution, which uses warm water to cool the machine.

NVIDIA also claims that the following well-known cloud service providers (CSPs) and systems builders expect to incorporate A100 GPUs into their offerings: Alibaba Cloud, Amazon Web Services (AWS), Atos, Baidu Cloud, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Google Cloud, H3C, Hewlett Packard Enterprise (HPE), Inspur, Lenovo, Microsoft Azure, Oracle, Quanta/QCT, Supermicro, and Tencent Cloud.