Date/Time/Place: September 29, 2004, SDSC Auditorium
3-4 pm Seminar
4-5 pm Technical Discussion
Speaker: Samuel Kerrien
EMBL – European Bioinformatics Institute
Title: UniProt 2.0 – Technical Overview & Discussion
Abstract: The annotation in gene and gene product is deeply rooted in a flat file/free text tradition. Relational or object oriented models of the data and their implementations are rare and usually have access restrictions for end users. Users of the UniProt data set, which is encompassing the former Swiss-Prot, TrEMBL and PIR content also has to deal with these legacy problems. The flat file format that has been used, extended and maintained over many years is readable for human eyes but difficult to process algorithmically. With the exponential growth of protein information, doubling the data amount approximately every two years, downloading the whole sets and parsing out what is actually needed on the end users side became cumbersome and CPU intensive. With UniProt, we are trying to overcome this situation and give users query access to our data sources. We are trying to make parsing procedures on the user side redundant by providing a solid interface to the data set as a whole and also to the individual protein entries.
This presentation will give an overview over the libraries that are used at the EBI to present the UniProt data and to automatically process protein entries. Those libraries will be made available shortly for users and additional services are in the planning. The speaker will be going into some detail in terms of how to make use of the read only copy of the UniProt data base inside Java program code
|