SDSC’s Expanse Supercomputer Formally Enters Production

Education

SDSC’s Expanse Supercomputer Formally Enters Production

Published December 07, 2020

The San Diego Supercomputer Center (SDSC) at UC San Diego announced that its new Expanse supercomputer formally entered service for researchers following a program review by the National Science Foundation (NSF), which awarded SDSC a grant in mid-2019 to build the innovative system.

At over twice the performance of Comet, SDSC’s current petascale supercomputer, Expanse supports SDSC's theme of ‘Computing without Boundaries’ with powerful CPUs, GPUs, and a data-centric architecture that supports a wide range of scientific workloads including experimental facilities, edge computing, and public clouds.

“The name of our new system says it all,” said SDSC Director Michael Norman, the principal investigator (PI) for Expanse and a computational astrophysicist. “With innovations in cloud integration and other features such as composable systems, as well as continued support for science gateways and distributed computing via the Open Science Grid (OSG), Expanse will allow researchers to push the boundaries of computing and substantially reduce their times to discovery.”

A key innovation of Expanse is its ability to support composable systems, which can be described as the integration of computing elements such as a combination of CPU, GPU, and other resources into scientific workflows that may include data acquisition and processing, machine learning, and traditional simulation. Expanse also supports integration with the public cloud providers, leveraging high-speed networks to ease data movement to/from the cloud, and a familiar scheduler-based approach.

The new system has been in early-user testing for the past month, with researchers from various domains and institutions running actual research projects to validate Expanse’s overall performance, capabilities, and reliability (see sidebar).

Like Comet, which is slated to conclude operations as an NSF resource on March 2021 after six years of service, Expanse is designed for modest-scale jobs of just one core to several hundred cores. This includes high-throughput computing jobs via integration with the OSG, which can have tens of thousands of single-core jobs. Those modest-scale jobs are often referred to as the ‘long tail’ of science. Virtually every discipline, from multi-messenger astronomy, genomics, and the social sciences, as well as more traditional ones such as earth sciences and biology, depend upon these medium-scale, innovative systems for much of their productive computing.

“Comet’s focus on reliability, throughput, and usability has made it one of the most successful resources for the national research community, supporting tens of thousands of users across all domains,” said SDSC Deputy Director Shawn Strande, a co-PI and project manager for Expanse. “Our approach with Expanse was to assess the emerging needs of the community and then work with our key partners including Dell, AMD, NVIDIA, Mellanox, and Aeon, to design a system that meets or exceeds those needs.”

Expanse's standard compute nodes are each powered by two 64-core AMD EPYC 7742 processors and contain 256 GB of DDR4 memory, while each GPU node contains four NVIDIA V100s connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. Expanse also has four 2 TB large memory nodes. The entire system, integrated by Dell, is organized into 13 SDSC Scalable Compute Units (SSCUs), comprising 56 standard nodes and four GPU nodes, and connected with 100 GB/s HDR InfiniBand.

Remarkably, Expanse delivers over 90,000 compute cores in a footprint of only 14 racks. Direct liquid cooling (DLC) to the compute nodes provides high core count processors with a cooling solution that improves system reliability and contributes to SDSC’s energy efficient data center.

Every Expanse node has access to a 12 PB InfiniBand-based Lustre parallel file system (provided by Aeon Computing) that delivers over 140 GB/s. Local NVMe on each node gives users a fast scratch file system that dramatically improves I/O performance for many applications. In 2021, a Ceph-based file system will be added to Expanse to support complex workflows, data sharing, and staging to/from external sources. The Expanse cluster is managed using the Bright Computing HPC Cluster management system, and the SLURM workload manager for job scheduling.

A video of SDSC’s presentation on Expanse during last month’s International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC20) can be found here. Other details and full specifications can be found here.

Expanse will serve as a key resource within the NSF's Extreme Science and Engineering Discovery Environment (XSEDE), which comprises the most advanced collection of integrated digital resources and services in the world. The NSF award for Expanse runs from October 1, 2020 to September 30, 2025 and is valued at $10 million for acquisition and deployment, plus an estimated $12.5 million for operations and maintenance.

Expanse Helps Early Users Advance Discovery

More than 30 teams, working in fields ranging from computational biology to astrophysics, accessed Expanse through the Early User program. This provided them an opportunity to test the system and carry out their research before Expanse went into full production.

“With a peak speed of five petaflops per second and an enormous storage capacity, Expanse has already shown us what a difference this high-performance resource will make in our National Science Foundation user community and well beyond,” said SDSC’s Director Mike Norman. “Our test runs required more compute power than available on Comet, our current system, and the results for these studies on Expanse were astonishing.”

“We also leveraged Spack, an open-source package manager for high-performance computing (HPC) software, to deploy a broad range of applications very quickly and optimally,” added Mahidhar Tatineni, lead for SDSC’s HPC User Services. “Our early Expanse users were able to build and test their applications quickly in this environment.”

Norman was referring to a computational biology effort led by UC San Diego Engineering Professor Siavash Mirarab and his team that used Expanse to run intense calculations for the inference of phylogenetic trees. The first project used Expanse’s GPU nodes to elucidate the evolutionary relationship between 300 bird species using 150,000 genomic regions.

“Our Bird Tree of Life shows how groups of birds evolved over time and would not have been possible without Expanse and its immense compute power,” said Mirarab. A second project ran a ‘divide and conquer’ algorithm across Expanse’s standard compute nodes to create a tree of 600,000 microbial species from 380 genes.

Another early user, David Radice, an assistant astrophysics professor at Pennsylvania State University, used Expanse to complete several numerical simulations of neutron star mergers. “We’ve only recently detected neutron star mergers,” said Radice, a long-time user of SDSC’s Comet system. “To better understand the impact of quark deconfinement in neutron star mergers, we created a series of complex simulations only able to run on a high-performance system such as Expanse, so we were thrilled when SDSC asked us to participate in the preliminary testing of this supercomputer.”

Another early use case focused on a new simulation of the electronic structure of hybrid organic inorganic perovskites (HOIPs) by a research team at Duke University. While perovskites have generated a great deal of interest as possible materials for future solar energy components, their instability requires more fine-tuning before they will be readily available for commercial use.

“Our Expanse simulation allowed us to showcase a new way of forming HOIPs that will be helpful in furthering the advancement of this complex research problem,” said Yi Yao, a computational chemist and research scientist at Duke.

About SDSC

The San Diego Supercomputer Center (SDSC) is a leader and pioneer in high-performance and data-intensive computing, providing cyberinfrastructure resources, services, and expertise to the national research community, academia, and industry. Located on the UC San Diego campus, SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from astrophysics and earth sciences to disease research and drug discovery.

News

SDSC’s Expanse Supercomputer Formally Enters Production

Expanse Helps Early Users Advance Discovery

Categories

Archive

Media Contact

Related Links