Below we include links to some of the relevant publications for Dash and Gordon and flash-memory-based systems.
He, J., Jagatheesan, A., Gupta, S., Bennett, J., Snavely, A., Supercomputing, 2010. New Orleans, LA. November 2010.
ABSTRACT: Data intensive computing can be defined as computation involving large datasets and complicated I/O patterns. Data intensive computing is challenging because there is a five-orders-of-magnitude latency gap between main memory DRAM and spinning hard disks; the result is that an inordinate amount of time in data intensive computing is spent accessing data on disk. To address this problem we designed and built a prototype data intensive supercomputer named DASH that exploits flash-based Solid State Drive (SSD) technology and also virtually aggregated DRAM to fill the latency gap. DASH uses commodity parts including Intel® X25-E flash drives and distributed shared memory (DSM) software from ScaleMP®. The system is highly competitive with several commercial offerings by several metrics including achieved IOPS (input output operations per second), IOPS per dollar of system acquisition cost, IOPS per watt during operation, and IOPS per gigabyte (GB) of available storage. We present here an overview of the design of DASH, an analysis of its cost efficiency, then a detailed recipe for how we designed and tuned it for high data-performance, lastly show that running data-intensive scientific applications from graph theory, biology, and astronomy, we achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures.
He, J., Bennett, J., Snavely, A., TeraGrid 2010 Annual Meeting. Pittsburgh, PA., July 2010.
ABSTRACT: HPC applications are becoming more and more data-intensive as a function of ever-growing simulation sizes and burgeoning data-acquisition. Unfortunately, the storage hierarchy of the existing HPC architecture has a 5-order-of-magnitude latency gap between main memory and spinning disks and cannot respond to the new data challenge well. Flash-based SSDs (Solid State Disks) are promising to fill the gap with their 2-order-of-magnitude lower latency. However, since all the existing hardware and software were designed without flash in mind, the question is how to integrate the new technology into existing architectures. DASH is a new Teragrid resource aggressively leveraging flash technology (and also distributed shared memory technology) to fill the latency gap. To explore the potentials and issues of integrating flash into today's HPC systems, we swept a large parameter space by fast and reliable measurements to investigate varying design options. We here provide some lessons we learned and also suggestions for future architecture design. Our results show that performance can be improved by 9x with appropriate existing technologies and probably further improved by future ones.
Jonathan A. Myers*, Mahidhar Tatineni**, Robert S. Sinkovits**
*Large Synoptic Survey Telescope, **San Diego Supercomputer Center
ABSTRACT: Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.
Robert S. Sinkovits, Pietro Cicotti, Shawn Strande, Mahidhar Tatineni, Paul Rodriguez, Nicole Wolter, and Natasha Balac
San Diego Supercomputer Center, University of California, San Diego
[sinkovit, pcicotti, strande, mtatineni, p4rodriguez, nickel, natashab] @sdsc.edu
ABSTRACT: The Gordon data intensive computing system was designed to handle problems with large memory requirements that cannot easily be solved using standard workstations or distributed memory supercomputers. We describe the unique features of Gordon that make it ideally suited for data mining and knowledge discovery applications: memory aggregation using the vSMP software solution from ScaleMP, I/O nodes containing 4 TB of low-latency flash memory, and a high performance parallel file system with 4 PB capacity. We also demonstrate how a number of standard data mining tools (e.g. Matlab, WEKA, R) can be used effectively on Dash, an early prototype of the full Gordon system.
Mitesh R. Meswani, Pietro Cicotti, Jiahua He, and Allan Snavely
San Diego Supercomputer Center, University of California, San Diego, CA, USA