San Diego Supercomputer Center and IBM Team Wins SCinet Network Bandwidth and StorCloud Challenges at SC 2004 Conference
Team Achieves Record Data Transfer and Data Sorting Rates on TeraGrid Network; Work Enabled Enzo Scientists to Track Early-Universe Star Formation
A team of high-performance computing engineers from the San Diego Supercomputer Center (SDSC) and IBM demonstrated expert management of large-scale data resources using innovative cyberinfrastructure tools at the 2004 Supercomputing Conference in Pittsburgh, Penn. Using StorCloud SAN-attached storage and the General Parallel File System (GPFS) from IBM, along with computation and visualization resources at various TeraGrid sites, a new computation and visualization was displayed to attendees at the conference. With these tools, Enzo scientists were able to see the process of massive star formation and destruction.
"To achieve the promise of grid computing, high-performance computing applications need coordinated access to the set of resources that comprise cyberinfrastructure – superior compute platforms, on-demand remote data access, visualization tools and access to archival storage," said Dr. Fran Berman, director of SDSC. "The TeraGrid cyberinfrastructure offers these distinctive resources to high-performance applications."
The SDSC/IBM team was awarded with the highest achieved StorCloud Bandwidth and I/Os per second for the Enzo submission. As part of the submission, the team also broke a world record by sorting a terabyte of random data in 487 seconds (8 min, 7 sec), more than twice as fast as the previous record (1,057 seconds; 17 min, 37 sec). The bandwidth achieved was 15 GB per second.
The team also received the Best Spirit of the SCinet Bandwidth Challenge Award for enabling a scientific application to achieve 27 Gb per second over the TeraGrid network, utilizing more than 95 percent of the available bandwidth.
This visualization shows the filamentary structure of the early universe in a co-moving box of size 5.6 MegaParsecs (17.92 Lightyears) at a Red Shift of Z = 7.4333.
This visualization shows the temperature distribution of the same volume at the same epoch as the density visualization. Pink indicates the highest temperatures associated with the
star forming regions.
This computation illustrates how a scientist can schedule a computation and visualization in automatic succession at different sites using the Grid Universal Remote metascheduler without moving any files from one site to another. A global parallel file system that spans sites allows data to be shared without duplicating the hardware and data at each site, which makes a cost effective, high performance solution for partner sites. No matter where users go throughout the grid, the files are available at any site mounting the file system.
Also demonstrated was an important component of cyberinfrastructure. Using the Grid Universal Remote developed by SDSC team members, engineers were able to reserve resources across distributed sites in a coordinated fashion. User-settable reservations at SDSC and Purdue University provided the framework to make this possible.
The Grid Universal Remote allows users direct access to local cluster scheduling, within policy limits. Previously, this was only possible with manual intervention by system administrators.
"Our vision is to provide scientists with an easy-to-use, seamless environment that allows them to utilize all the unique distributed resources available on the grid," says Berman. "The TeraGrid team really stepped up to the plate on this challenge, providing an unprecedented level of team technology coordination."
Resources used included 120 TB of IBM TotalStorage DS4000 (FAStT) storage systems as well as 80 processors serving out storage and data from the showroom floor to NCSA and SDSC. Computation was done on SDSC’s premier high-performance compute system, DataStar.
Founded in 1985, the San Diego Supercomputer Center (SDSC) evidences two decades of enabling international science and engineering discoveries through advances in computational science and high performance computing. Continuing this legacy into the era of cyberinfrastructure, SDSC is a strategic resource to science, industry, and academia, offering leadership in the areas of data management, grid computing, bioinformatics, geoinformatics, and high-end computing. The mission of SDSC is to extend the reach of scientific accomplishments by providing high-end hardware technologies, integrative software technologies, and deep inter-disciplinary expertise to the community. SDSC is an organized research unit of the University of California, San Diego with a staff of more than 400 scientists, software developers and support personnel, primarily funded by the National Science Foundation (NSF). For more information, see www.sdsc.edu.
SDSC Communications, 858.534.8314
SDSC Communications, 858.534.8363