Home > Archives > Additional Archives > Pavel Pevzner
 
Pavel Pevzner
Dept of Computer Science & Engineering
UCSD
La Jolla, CA 92093
olson@scripps.edu
http://www.scripps.edu/pub/olson-web/
 
Abstract:

Children like puzzles and they usually assemble them by trying all possible pairs of pieces and putting together pieces that match. Biologists assemble genomes in a surprisingly similar way, the major difference being that the number of pieces is larger. For the last twenty years fragment assembly in DNA sequencing followed the "overlap - layout - consensus" paradigm that is used in all currently available assembly tools. Trying all pairs of pieces corresponds to the overlap step while putting the pieces together corresponds to the layout step of the fragment assembly. Our new EULER algorithm is very different from this natural approach - we never even try to match the pairs of fragments together and we don't have the overlap step at all. Instead we do a very counter-intuitive (some would say childish) thing: we cut the existing pieces of a puzzle into even smaller pieces of regular shape. Although it indeed looks childish and irresponsible, we do it on purpose rather than for the fun of it. This operation leads to the new EULER algorithm that resolves the 20-years-old "repeat problem" in fragment assembly and provides some important advantages over the Celera assembler.

The major improvement of EULER over other algorithms is that it resolves all repeats except long perfect repeats that are theoretically impossible to resolve without additional PCR experiments. EULER, in contrast to the Celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.

This is a joint work with Haixu Tang and Michael Waterman at USC.

For more information, see http://www-cse.ucsd.edu/users/ppevzner/
   
  Home | SDSC | UCSD | Campus Map | Contact Info
   
 
  The San Diego Supercomputer Center (SDSC) is a research unit of the University of California, San Diego, and the leading-edge site of the National Partnership for Advanced Computational Infrastructure. SDSC researchers conduct studies in computational science, develop high-performance computing and networking technologies, and participate in NPACI activities.

SDSC -- UC San Diego, MC 0505 -- 9500 Gilman Drive -- La Jolla, CA 92093-0505 -- 858-534-5000 -- 858-534-5152 (fax)
info@sdsc.edu © 2001, The Regents of the University of California