03/20/2013SDSC, Calit2 Create Cyberinfrastructure Network for ‘Big Data’ Transmissions
Viewing Research Bandwidth through a New Prism
After developing one of the most advanced research communications infrastructures on any university campus during the past decade, the University of California, San Diego is taking another leap forward in the name of enabling data-intensive science.
The Prism@UCSD project is building a research-defined, end-to-end cyberinfrastructure on the La Jolla campus that is capable of supporting bursts of data between facilities that might otherwise cripple the main campus network.
“High-performance cyberinfrastructure is a strategic necessity for a research university,” said UC San Diego Chancellor Pradeep K. Khosla. “The Prism network will enable rapid movement of ‘Big Data’ for multiple, diverse disciplines across campus, including science, engineering, medicine, and the arts.”
Calit2 is the hub of the Prism@UCSD network, linking researchers over 10Gbps and 40Gbps fiber paths, including 120Gbps capacity between Calit2 and SDSC, and 80Gbps from the core to, respectively, the Center for Networked Systems and the Physics lab of Prof. Frank Wuerthwein.
With $500,000 in funding from the National Science Foundation (NSF), researchers with the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2) and the university’s San Diego Supercomputer Center (SDSC) are building the network to support researchers in half a dozen data-intensive scientific areas, including genomic sequencing, climate science, electron microscopy, oceanography, and physics.
“We’ve identified a variety of big data users on this campus who need 10 gigabit/s and faster bandwidth to deal with the avalanche of data coming from scientific instruments such as sequencers, microscopes, and computing clusters,” said Philip Papadopoulos, principal investigator on the Prism@UCSD project, who splits his time between Calit2 and SDSC. “We're starting at one Terabit/s of connected capacity through our next-generation modular switch, which is at the center of the Prism network. It can carry 20 times the traffic of our current research network, and it’s 100 times the bandwidth of the main campus network.”
“You can think of Prism as the HOV lane,” added Papadopoulos, “whereas our very capable campus network represents the slower lanes on the freeway.”
“Prism@UCSD is a response to the growing challenge of Big Data,” said Calit2 Director Larry Smarr. “The key innovation in Prism@UCSD is to provide end-to-end dedicated large bandwidth to the end-users on campus.”
Added SDSC Director Michael Norman: “Prism is the answer to how to move massive volumes of instrument data generated on and off campus to SDSC's powerful Big Data computing and storage resources, Gordon and Data Oasis. It will unleash the scientific potential energy of a number of frontier science projects that have been bandwidth limited.”
Smarr and Papadopoulos have previously collaborated on multiple NSF-funded projects to enable cheaper, faster and more energy-efficient scientific computing, storage, and visualization. Their OptIPuter project developed a new computer networking paradigm, with optical networks – not computer processors – at the core.
That led to Quartzite, an experimental network with reconfigurable optical fiber paths, and wavelength selective switching. Prism builds on Quartzite, using a next-generation Arista Networks 7405 switch-router which boasts triple the energy efficiency and four times the capacity of Quartzite’s switch. Prism will also expand the existing Calit2-SDSC optical-fiber connection.
“By the time Prism is built out, we will have expanded the SDSC-Calit2 link from 50 to 120Gbps, and it won’t cost very much to get it to 160Gbps,” said Papadopoulos. “Other campus labs then connect directly to the Prism core at Calit2 with dedicated links of between 20 and 80 Gigabit/s each. The structure allows a Prism-connected lab to saturate any of our external links, no matter where they land on campus. It also enables these labs to share data with each other or utilize high-end resources at SDSC.”
The network will be a hybrid design – part “production” infrastructure for real-world use, and part “experimental” system for researchers to test networking ideas. On the production side, the campus is counting on Prism to reduce congestion on the main UC San Diego network by moving traffic from a few hundred researchers in the most data-intensive fields onto Prism, where they can work with huge data sets that might otherwise clog the campus infrastructure – a state of-the-art infrastructure that has to serve over 30,000 people.
The Prism Big Data network also creates a high-capacity ‘data freeway’ to campus, national, or international networks. Case in point: UC San Diego Physics Professor Frank Wuerthwein’s lab is the only Open Science Grid (OSG) node on the UC San Diego campus, and the lab’s cluster hosts massive amounts of data from the for the Large Hadron Collider (LHC).
“We want to expand the presence of OSG on this campus,” said Wuerthwein, who has signed up to use Prism@UCSD. “For the really big data we are holding – petabytes of Large Hadron Collider data, for instance – it is nice to have a network where we can transmit terabytes of data without killing the campus network in the process.”
Prism will also add a trunk line to the Computer Science and Engineering building, to serve users such as the Center for Networked Systems (CNS). CNS research scientist George Porter and his students use the SEED cluster for Big Data analysis. “One graduate student might work on a 100TB to 200TB data set, and there is only room for one of those at a time on that cluster,” said Porter. “So if you wanted to swap data sets, you’d kill the campus network, or you would have to stretch it out over the course of days.”
Another major campus user of Prism will be the National Center for Microscopy and Imaging Research (NCMIR), led by Mark Ellisman. “We run our own facilities that house petabytes of data distributed across three sites on campus, so being able to move around the data to wherever it is needed is extremely important.”
“The most data-intensive scientific applications get the most value out of using dedicated ‘fat’ pipes with the ability to accommodate short, extreme-sized bursts of data,” said Papadopoulos. “We believe Prism will be the forerunner of specialized, Big Data cyberinfrastructures on many research campuses – and beyond.”
According to Calit2’s Smarr, if Prism is a success at UC San Diego, the project will then explore ways to give nearby research labs access to the network, even if they are off campus. “UC San Diego has a symbiotic relationship with nearby biotech firms and research institutions on the Torrey Pines Mesa, institutions such as Salk, The Scripps Research Institute, the Sanford Stem Cell Consortium, and Sanford-Burnham,” he said. “We are entering the era of integrated, personalized ‘omics,’ and for San Diego to be a leader, we need to share biomedical data across the Mesa, regardless of which lab generates it.”