NVIDIA’s third generation DGX system – the DGX A100 – represents a massive improvement in all areas of the underlying architecture. The result is a 6RU beast that can flexibly perform every AI infrastructure task – data analytics, model training, and inference. This is achieved through new acceleration in the GPU and networking, as well as a new flexible architecture. In this post we take a look under the hood to explain what makes the DGX A100 a truly special innovation from NVIDIA.
NVIDIA A100 GPU
The starting point for the DGX A100 is the new A100 GPU. This new GPU from NVIDIA delivers 20x the performance of the previous Volta GPUs in TF32 training and INT8 inference. Running at 19.5 TFLOPS, the FP64 performance is 2.5x higher than that of the previous Tesla Volta V100 units. In addition to all this extra horsepower, the A100 can be split into 7 separate GPU instances.
Each A100 packs 12 NVLink connections, making each A100 capable of 600 GB/s bi-directional bandwidth between any two GPUs in the DGX A100. All GPUs are connected with six next generation NVSwitches, giving an overall 4.8TB/s bi-directional bandwidth. What does this mean in practice? The system could transfer 426 hours of HD video in a single second!
The DGX A100 also comes with 9 Mellanox ConnextX-6 NICs each providing 200 Gb/s of network bandwidth
Memory, CPU, and Storage
The cache, processing memory and on-board GPU memory have all been increased to enable the massive accelerated GPU computing capabilities in the new DGX A100. Each A100 has 40GB of GPU memory, for a total of 320GB in the DGX. The DGX A100 runs dual 64-core AMD Rome CPUs with 1TB RAM. Finally, the DGX A100 has 15TB of Gen4 NVMe SSD to hold large data sets and feed the data hungry A100 GPUs.
MIG – Multi-instance GPU
The DGX A100 packs 8 of these new A100 GPUs, providing a huge boost in processing power at all levels, and providing up to 56 GPU instances to work with – either singularly or combined as required for the workload. This multi-GPU capability is provided in a flexible manner, and is the real secret sauce that makes the DGX A100 a universal AI engine.
Using MIG, you can optimise GPU utilisation, expand access to more users, and guarantee quality of service and performance. The 7 GPUs in each A100 can be combined to run workloads in parallel, from a few all the way up to using all 56 GPU instances at once. This flexibility is what allows the DGX A100 to adapt and serve the needs at each stage of the AI processing pipeline – from analytics to training to inference.
AI workloads evolve over time. In the beginning, there are massive data sets to crunch and analytics work to be done. Then the workload shifts to training the machine learning models. Finally, inference takes centre stage and the workload shifts again. Typically, these three workloads have been performed in clusters of CPU and/or GPUs depending on the workloads. This often leaves resources under-utilised when the work moves through the stages, while it can also leave teams short of the resources they need for the job at hand. The new DGX A100 solves this problem through MIG, allowing the infrastructure to be more elastic. With the DGX A100 and MIG you can compose the infrastructure in a way that meets your needs at each stage of the AI workload. This innovation has the potential to replace racks of CPU clusters with a one or two DGX A100s.
Take the Next Step
Learn how the DGX A100 can accelerate your time to insight, contact the XENON team today.Talk to a Solutions Architect
Have a look under the hood in this video from NVIDIA.