A novel supercomputing resource created by researchers at the San Diego Supercomputer Center at the University of California, San Diego, is allowing scientists to study evolutionary relationships among large populations of living things in significantly shorter times - and without having to understand how to operate large, complex computer systems.
The new resource, called the CIPRES ( Cyber Infrastructure for Phylogenetic RESearch) Gateway, is an Internet portal that allows scientists anywhere in the world to upload their data via a standard Web browser and perform phylogenetic analyses. The most time-consuming analyses use supercomputers, such as SDSC's new Trestles system, that are part of the National Science Foundation's TeraGrid, the world's most powerful collection of high-performance computing resources dedicated to academic research.
"In addition to answering the age-old questions of how all living things are related to each other, understanding evolutionary relationships has some very important practical benefits," said Mark Miller, principal investigator in SDSC's Research, Education and Development group, and leader of the CIPRES Gateway project. "For example, knowing the evolutionary relationships among a group of viruses or bacteria can help doctors understand where an infection came from, effectively treat patients who are infected, and work to contain the spread of disease during an outbreak."
Moreover, understanding how individual species adapt for survival in a specific geographic location can help scientists manage a species for long-term survival in that location, or engineer crops for higher productivity in a particular location.
Evolutionary relationships are uncovered by comparing DNA sequences from individuals under study. Just as a single DNA sequence can be used to identify a criminal with a very high degree of accuracy, a group of DNA sequences can be used to determine just how closely related any group of living things are with great precision.
"DNA sequences from individuals can be prepared so quickly and cheaply now, we can understand evolutionary relationships more accurately than ever before," according to Miller. "The problem is, the number of computations required grows quickly as the amount of data grows. There are only three possible relationships between any four individuals, but there are more than two million different relationships between 10 individuals. A computer that could analyze a million trees per second would require about 20 billion years to test all the possible relationships for just 22 individuals!"
Solving this problem is where the CIPRES Gateway and TeraGrid supercomputers come in. The power of supercomputers comes from parallel computing, in which large analyses are broken into smaller pieces that are run simultaneously on many processor cores. Under the TeraGrid's Advanced User Support program, Wayne Pfeiffer, a distinguished scientist at SDSC, helped improve the parallel performance of RAxML and MrBayes, two widely used phylogenetics codes.
"Most RAxML analyses submitted to the CIPRES Gateway now run on 60 cores of
Trestles," said Pfeiffer. "With a typical speedup over a single core of about 30, this means that analyses that would require a month on a laptop can be completed in a day via the gateway."
"This is an excellent example of how science is being transformed through new ways of leveraging the capabilities of today's supercomputers," said Richard Moore, SDSC's deputy director. "Significantly reducing the time it takes researchers to run such complex analyses, while freeing them from having to fully understand all the intricacies of today's supercomputers, means greater scientific productivity. This is what makes the CIPRES Gateway such a valuable phylogenetic resource."
Although SDSC's CIPRES Gateway has been in operation for a little more than a year, it has already provided immediate benefits to the scientific community, both in time savings and in new discoveries. To date, more than 2,000 scientists have run more than 35,000 analyses for approximately 100 completed studies. These studies span a broad spectrum of biological and medical research.
One study, recently published in the journal Parisitology, focused on research that showed humans are much more likely to infect apes with malaria, than the reverse. Details of that study can be found online.
"Without the CIPRES Gateway, this work, and the other projects I am working on, would not go as quickly or as smoothly," said James B. Munro, a researcher from the University of Maryland School of Medicine, who was part of the research team that reported the new insights into the complex relationship between the malarial parasites and their mammalian hosts.
CIPRES was a five-year project among 16 institutions, and funded by the NSF from 2003-2008. Its goal was to enable large-scale phylogenetic reconstructions on a scale that supports analyses of huge data sets containing hundreds of thousands of biomolecular sequences and to create an infrastructure that continues to support phylogenetic investigations. Ongoing projects include the CIPRES Gateway, as well as TreeBaseII, a repository of user-submitted phylogenetic trees and the data used to generate them, and Crimson, a database that facilitates the extraction of sub-trees from very large phylogenetic trees.
As an organized research unit of UC San Diego, SDSC is a national leader in creating and providing cyberinfrastructure for data-intensive research, and celebrated its 25th anniversary in late 2010 as one of the National Science Foundation's first supercomputer centers. Cyberinfrastructure refers to an accessible and integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC is a founding member of TeraGrid, the nation's largest open-access scientific discovery infrastructure.
Jan Zverina, SDSC Communications
858 534-5111 or email@example.com
Warren R. Froelich, SDSC Communications
858 822-3622 or firstname.lastname@example.org