(L to R) Chaitan Baru, James Short, and Roger Bohn, co-authors of the How Much Information? Report on Enterprise Server Information. Photo: Ben Tolo/SDSC
The project, called the Center for Large-scale Data Systems Research (CLDS), formally begins operations this fall and will also be home to the ongoing How Much Information? (HMI?) research program, which released a new report this week at the Storage Networking World (SNW) Spring 2011 conference in Santa Clara, Calif.
The latest report by HMI?, a consortium led by UC San Diego and previously based at the university's School of International Relations and Pacific Studies, analyzes the growth of "big data" in companies. The authors found that the world's installed base of computer servers processed almost ten million million gigabytes of information in 2008, almost 10 to the 22nd power. Full details are available online.
"We are entering an era of data-intensive computing, where all of us - academia, industry, and government - will be faced with organizing, analyzing, and drawing meaningful conclusions from unprecedented amounts of data, and doing so in a cost- and performance-effective manner," said Michael Norman, SDSC's director.
SDSC recently announced the startup of two data-intensive computing systems, Dash and Trestles. Those systems will be followed later this year by a significantly larger system called Gordon, which will be the first supercomputer to employ large amounts of flash memory to help speed solutions to computing problems now limited by higher latency spinning disk technology. When deployed, Gordon should rank among the top 100 supercomputers in the world, capable of doing latency-bound file reads 10 times faster and more efficiently than any high-performance computing system today.
"It is new technology such as SDSC's flash memory-based systems that is changing how science and research will be done in the Information Age," added Norman. "CLDS will serve as a laboratory that will put us on the leading edge of adaptation and integration of technologies such as this, and explore the multi-faceted challenge of working with big data in collaboration with academic and industry partners."
In addition to serving as the host site for ongoing HMI? research, CLDS will test and evaluate new trends in cloud-based storage systems, examining the cloud computing principles of "on-demand, elasticity, and scalability" in the context of large-scale storage requirements. Research will include exploration of new storage architectures and benchmark development.
"Establishing CLDS at SDSC is a natural fit," said Chaitan Baru, an SDSC distinguished scientist and director of the new project, adding that the center will be structured as an industry-university consortium. "SDSC is recognized for its expertise in the development of systems for storing, managing, and analyzing 'big data.' Our goal here is to understand how new technologies will change the way we work in this data-rich age."
Moreover, CLDS will be a key resource to strengthen analytical and research relationships, while fostering industry partnerships and exchanges through individual or group research projects, and providing support for industry forums and other professional education programs.
"Integrating management, economic and technical analysis is what all companies will need in the world of "big data" and even bigger analytics," said James Short, research director of the HMI? program and lead scientist for the CLDS project. "SDSC offers a rich environment for integrating management analysis with both applied and theoretical computer science for research in large-scale data systems."
Funding for the new center will come from a combination of industry, foundation, and government grants. Industry inquiries may be directed to Ron Hawkins, SDSC's director of industry relations, at (858) 534-5045 or email@example.com.
As an organized research unit of UC San Diego, SDSC is a national leader in creating and providing cyberinfrastructure for data-intensive research, and celebrated its 25th anniversary in late 2010 as one of the National Science Foundation's first supercomputer centers. Cyberinfrastructure refers to an accessible and integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC is a founding member of TeraGrid, the nation's largest open-access scientific discovery infrastructure.
The HMI? program was established in 2008 with the goal of building a verifiable census of the world's data and information, and to effectively measure how that information is both created and consumed. An earlier HMI? report focused on how much information was consumed by American households in 2008. The project is sponsored by the Alfred P. Sloan Foundation and industry sponsors including AT&T, Cisco Systems, IBM, Intel, LSI, Oracle, and Seagate Technology.
Chaitan Baru, (858) 945-6716
James Short, (858) 337-7728
Jan Zverina, SDSC Communications
(858) 534-5111 or firstname.lastname@example.org
Warren R. Froelich, SDSC Communications
(858) 822-3622 or email@example.com