Presenter: Glenn K. Lockwood, Ph.D.
User Services Consultant, SDSC
This webinar is past.
The cost of sequencing human genomes has been dropping at a rate that has made genomic studies involving hundreds or thousands of whole human genomes viable. While the benefits to medicine and healthcare are undeniable, the rate of advancement in sequencing has been outpacing Moore's law for half a decade. As a result, the cutting edge of genomics research is not limited by the number of genomes that can be sequenced, but rather by the computational power available to perform the necessary processing of the raw data coming from sequencers.
Gordon, the data-intensive supercomputer at SDSC, has a unique architecture that provides the flexibility, capability, and capacity to rapidly process hundreds to thousands of human genomes on the order of weeks to months. This talk will discuss a case study where 438 human genomes were processed through a 9-stage read mapping pipeline followed by a group variant calling pipeline that utilized Gordon's massive parallel computing capacity, memory-rich design, multi-terabyte flash storage arrays, and high-performance parallel filesystem to transform raw reads to called variants in just several weeks.
Our experience with the computational requirements for the read-mapping and variant calling pipelines will be presented to provide a general breakdown of the time and infrastructure needed to support these large-scale genomic studies.
About the Speaker
Glenn K. Lockwood is a user services consultant at the San Diego Supercomputer Center where he provides a full range of support to the users of its supercomputing resources. A materials scientist by training, he now provides researchers with the knowledge and tools they need to make the most productive use of high-performance technology. He specializes in workload analysis and data-intensive optimization.