![]() ![]() ![]() Using the FabreX technology, GigaIO demonstrated 32 AMD Instinct MI210 accelerators running in a single-node server. Basically, GigaIO offers a PCIe network called FabreX™ that creates a dynamic memory fabric that can assign resources to systems in a composable fashion. There is no partitioning of the GPUs across server nodes, the GPUs are fully usable and addressable by the host node. These GPUs are visible to a single host system. Indeed, GigaIO has just introduced a single-node supercomputer that can support up to 32 GPUs. While CXL is not quite available yet, one company, GigaIO, does offer CXL capabilities today. CXL technology maintains memory coherency between the CPU memory space and memory on attached devices, which allows resource sharing for higher performance, reduced software stack complexity, and lower overall system cost. The CXL standard, which is rolling out in phases, is an industry-supported Cache-Coherent Interconnect for Processors, Memory Expansion and Accelerators. The “stranded” hardware situation has not gone unnoticed and Compute Express Link™ (CXL™) was established to help with this trend. Simply put, packing more memory, cores, and GPUs into a single server may reduce the overall cost, but for HPC workloads it may end up stranding a lot of hardware over time. The large granularity of this server means an amount of memory and CPUs may be stranded from use. A four-GPU node server works great, but it may be used exclusively for GPU jobs and otherwise sit idle. As servers have packed in more hardware (i.e., large memory multi-core nodes with multiple GPUs) the ability to share resources becomes a bit trickier. This granularity allowed for more effective resource application. In the past, a server with a single socket processor, moderate amount of memory, and a single GPU were much more granular than today’s systems. In addition, shared computing environments often have GPU nodes that may sit idle because they are restricted to GPU-only jobs leaving the CPUs and memory unavailable for any work. There are ways to leverage all these GPUs for one application by using MPI across servers, but many times this is not very efficient. Fair enough, using four modern GPUs offers a significant amount of HPC heft, but can we go higher? Before we answer that question, consider a collection of eight servers each with four GPUs, for a total of 32 GPUs. Some servers offer up to eight GPUs, but the standard server usually only offers four GPUs slots. The HPC users dream is to keep stuffing GPUs into a rack mount box and make everything go faster. ![]() Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |