Projects

Rotation/PhD Projects: General Topic – Pharmaceutical Sciences Network Pharmacology, Determining Side Effects, and Drug Repositioning

We have recently developed a method to determine the geometric potential [1] which describes the ligand binding sites of 3-dimensional proteins. Subsequently we developed a fast approach to search for these sites in a high throughput mode [2] across the druggable proteome. The goal of the project is to use these tools to search for competitive binding of major pharmaceuticals which might explain observed side effects, reposition an existing drug, or ultimately point the way towards further lead optimization. So far we have been able to offer an explanation for the side effects observed using selective estrogen receptor modulators (SERMS), for example tamoxifen [3] and potentially reposition an existing drug for use in the treatment of TB [4]. Most recently we have offered an explanation for why Torcetrapib was withdrawn from phase III clinical trials after $800M had been spent [5].

Projects exist to search for alternative binding sites for a variety of major pharmaceuticals.

[1] L. Xie and P.E. Bourne 2007 A Robust and Efficient Algorithm for the Shape Description of Protein Structures and Its Application in Predicting Ligand Binding Sites BMC Bioinformatics, 8(Suppl 4):S9

[2] L. Xie and P.E. Bourne 2008 Detecting Evolutionary Linkages Across Fold and Functional Space with Sequence Order Independent Profile-profile Alignments. PNAS, PNAS, 105(14) 5441-5446

[3] L. Xie, J. Wang and P.E.Bourne 2007 In Silico Elucidation of the Molecular Mechanism Defining the Adverse Effect of Selective Esterogen Receptor Modulators. PLoS Comp. Biol., 3(11) e217.

[4] S.L Kinnings, N. Buchmeier, N. Liu, P.J. Tonge L. Xie and P.E. Bourne 2009 Discovery of Novel Drug Leads to Treat Multi-drug and Extensively Drug Resistant Tuberculosis by Repositioning Safe Pharmaceuticals: A Chemical Genomics Approach with Subsequent Biological Validation. PLoS Comp. Biol. 5(7) e1000423.

[5] L. Xie, J. Li, L. Xie, and P.E.Bourne 2009 Drug Discovery Using Chemical Systems Biology: Identification of the Protein-Ligand Binding Network To Explain the Side Effects of CETP Inhibitors, PLoS Comp. Biol.. 5(5) e1000387.

Primary Mentors: Phil Bourne, Li Xie & Lei Xie

Examples:


Pseudomonas aeruginosa drug discovery

This project will apply our chemical systems biology approach to Pseudomonas aeruginosa drug discovery. P. aeruginosa infection mainly attack people with weak immune system. We will apply SMAP and systems biology to identify potential drug targets and drug leads to reduce virulence of P.aeruginosa. The project mainly consists of the following steps:

  1. Whole genome reconstruction of P.aerugionosa metabolic pathways.
  2. Prediction of genome-wide protein ligand interaction network using SMAP and protein-ligand docking (the computation is ongoing).
  3. Using COBRA systems biology toolbox to simulate drug effects on P.aerugionosa survive.

More information on the P. aeruginosa PA01 and its whole genome is available at www.pseudomonas.com. The prediction will be experimentally validated by Dr. Brinkman’s group at Simon Fraser University.


Structure and ligand binding specificity of human organic anion transporters (OATs)

Human OATs interact with urate and acidic metabolites and a large number of drugs. Their structures and ligand binding specificities remain unknown. This project will integrate our structure modeling capability with experimental data provided by Dr. Nigam’s lab at UCSD to iteratively model the ligand binding site of the OAT. The project includes:

  1. Develop homology models of OATs, with special attention to the ligand binding site by incorporating QSAR data
  2. Determine ligand binding propensity by applying SMAP to the OAT models. The model will be further refined using high-throughput screening from Nigam’s lab.

The following references provide more information on OATs

  1. Ahsan N. Rizwan, Gerhard Burckhardt , “Organic Anion Transporters of the SLC22 Family: Biopharmaceutical, Physiological, and Pathological Roles”, Pharmaceutical Research, 24(3), p450
  2. David M. Truong, Gregory Kaler, Akash Khandelwal, Peter W. Swaan, and Sanjay K. Nigam, “Multi-level Analysis of Organic Anion Transporters 1, 3, and 6 Reveals Major Differences in Structural Determinants of Antiviral Discrimination”, THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 283, NO. 13, pp. 8654–8663

Bacterial RNA polymerase (RNAP)

Bacterial RNA polymerase (RNAP) is a proven target for broad-spectrum antibacterial therapy. Unfortunately, the first-line anti-tuberculosis therapy rifamycin has high propensity to generate resistant mutations in the rifamycin binding site. Recently, an alternative binding site in the switch region of RNAP has been identified and it has been suggested as an exceptionally attractive target for discovery of new broad-spectrum antibacterial therapeutic agents since this region is distant from the binding site for rifamycin and from the binding sites for other characterized inhibitors of bacterial RNA. We ran SMAP on this region against the druggable proteins and human proteins in order to identify some existing drugs which could interact with this switch region. We followed a similar procedure as we have done on the CETP and found most of the top hits from SMAP are estrogen receptors. Electrostatic potential analysis, structure analysis and docking results showed that these off-targets could be very promising.

Without experimental validation we think a solid binding free energy calculation may help us to determine the binding affinity between the existing estrogen receptor inhibitors and RNAP. If the binding can be confirmed, we will have some confidence to say estrogen receptor inhibitors could be used as a potential drug for TB. There are basically three levels to calculate the binding affinity. Docking is the fast but the lowest level. The second level is MM/PBSA/GBSA and linear interaction energy method. And the most accurate methods are thermodynamic integration (TI) and free energy perturbation (FEP). We are considering the most accurate and the most time-consuming one, like thermodynamic integration. To do this, several MD simulations should be performed step by step from the free-state to the bound state.

Rotation Project: Methods for Finding Protein Sectors

The hierarchical organization of primary, secondary, tertiary, and quaternary structure has been a fundamental framework used in structural bioinformatics research, but a recent paper by Halabi et al (Cell 138, 774-786, 2009) suggests that functionally distinct units, called “protein sectors”, defy this well-accepted organization principle. Furthermore, the protein sectors, in several examples of small domains from large sequence families, are shown to diverge independently in evolution—they are identified by a statistical analysis of correlated evolution in amino acids. If the “sectors” theory proves to be general, it could usher in a paradigm shift in structural bioinformatics research. It should be interesting and important to develop methods to test this theory independently. For example, since these sectors are thought to involve in allosteric mechanisms and protein folding, do within-sector residues exhibit correlated dynamics distinct from that of a different sector? Can data from molecular dynamics simulations (atomic, or better yet, coarse-grained) be used to find motion-correlated residues? And will these residues correspond to functional (and evolutionary) sectors? On a further thought, if one wishes to decompose protein structure disregarding the traditional hierarchy, what would be a reasonable way of doing it?

Primary mentor: Ming-Jing Hwang

Rotation Project: A Structural Analysis on Allele Frequencies of Non-synonymous SNPs

Non-synonymous SNPs, the single nucleotide mutations that result in amino acid change, are the focus of a number of structural bioinformatics studies. These studies have mainly aimed to predict the effect of amino acid change on protein structure stability and function, and in turn the probability of causing disease. Allele frequency, a less used property of these SNPs, is now abundantly available, which can offer insight into the evolution pressure faced by these coding SNPs and thus may also yield useful information about protein structure. It seems worthwhile, therefore, to carry out a comprehensive statistical analysis, on proteome scale, of the allele frequencies of these SNPs to see whether there exist distinct distributions or characteristics of non-synonymous SNPs from a structural viewpoint, and whether they can be correlated with structural and other biophysical properties.

Primary mentor: Ming-Jing Hwang

Summer/Rotation/PhD Student Project: From Physical Model of Nucleosome Organization Towards Genome Annotation

The DNA of eukaryotic organisms is packed into nucleosomes that are involved in gene expression by regulating DNA accessibility to transcription factors and by histone modifications. In attempts to find sequence-dependent signals on DNA that determine the location of nucleosomes, several nucleosome positioning prediction models have been built (Segal et al., 2006; Ioshikhes et al., 2006; Peckham et al., 2007) that, however, can only explain about half of the in vivo nucleosome organization. Therefore, models built on physical principles are of a particular interest because they can more effectively capture essential positioning factors, such as the experimentally known long-range correlations in nucleosome organization and potentially lead to an improved prediction of nucleosome occupancy. One such model has recently been suggested (Miele et al., 2008). A version of the model has been implemented in our Lab in Java by Alexander Scott (MS student, University of York, UK), validated on experimentally determined DNasa I hypersensitive sites, which correspond to nucleosome-free DNA regions, in S. cerevisiae chromosome III (Yuan et al., 2005) and in human genome (Crawford et al., 2006), and applied to analysis of D. melanogaster and C. elegans promoters. While all studied promoters demonstrated pronounced nucleosome positioning signal, there were notable differences between, and within, each genome – indicating that nucleosomes might participate in a different manner in regulation of expression of different genes. The aim of the project is to further investigate to what extent the structure and dynamics of chromatin has been imprinted in the DNA sequence during evolution and whether it is possible to construct a gene annotation method based on the a physical model of nucleosome occupancy. It would be also interesting to investigate how the nucleosome occupancy is different in promoters from other gene regulatory regions, such as introns, enchancers, CpG islands, 3’-terminal and ultraconserved elements as well as in genes of different types within the same or different genomes.

Primary Mentors: Julia Ponomarenko and Apostol Gramada

Rotation/Summer/PhD Projects: General Topic – Earth Sciences Meets Life Sciences

There has been a variety of exciting work in the past few years that relates geochemistry to genomics (see Cavalier-Smith 2006 Philos Trans R Soc Lond B Biol Sci. 2006 Jun 29;361(1470):969-1006). This is important since it provides new insights into evolutionary events and may provide insights into the implications of human impact on our ecosystems as measured at the fundamental level of DNA. Analysis at the genome level masks implications seen only at the protein level. Our laboratory specializes in structural bioinformatics and as such is the only laboratory to our knowledge looking at this emerging field that cuts across geosciences and biosciences from the perspective of protein structure.

In a recent paper (Dupont et al. 2006 PNAS 103(47) 17822-17827) we have explored the impact of geochemical changes (environmental changes over evolutionary time scales) in metal ion concentration on the proteomes of modern day organisms and provide the first evidence for a strong correlation. Only through the disposition of protein structures known to bind metals across the three superkingdoms of life was this analysis possible. Looking at the impact on changing environment at the level of the superkingdom is a course grained study which sets the stage for more detailed analyses.

The following are projects that we are keen to see pursued that expands on this initial finding. They are based on a premise bought forward by Cavalier-Smith that there may be “megaevents” which in our case would imply the emergence of a new fold, domain combination or other structural event that led to new phenotypic variance with significant implications.

Rotation/Summer Project: Exploring the Impact of Co and Mo Environments on Life

This is a follow on from the PNAS study which focused primarily on Fe and Zn.

Primary Mentors: Phil Bourne and Ruben Valas

Rotation/Summer/PhD Project: Exploring the Flexibility versus Designability of Protein Folds

Nature consists of a very limited number of protein folds - of the order of 1200 are known today. Each protein fold accommodates one or more families of proteins. The more families that a fold accommodates the more designable it is said to be.

Recently we have developed a method (Gu et al. 2006 PLoS Comp. Biol. 2(7) e90) which measures the flexibility of a protein from only its amino acid sequence. The approach uses the Gaussian Network Model and a normalization procedure to measure relative flexibility at each amino acid position. Training a support vector machine on sequences taken from known structures from which flexibility can be experimentally verified a generalized method of determining flexibility from sequence is obtained.

Hypothesis - Flexibility is correlated with designability. That is the more designable a protein the more flexible it will be. Certainly this is intuitively what one would expect. The more flexible the protein the more sequences it can be expected to accommodate.

Part of the project will be to develop the appropriate experimental protocol to test this hypothesis, but the basic intent is to use the existing methodology (and code) to determine flexibility of a range of protein sequences which fall into given folds of varying designability. The latter can be taken directly from the SCOP database.

Primary Mentor: Phil Bourne

Summer/Rotation/PhD Student Project: General Topic – Immunoinformatics – Prediction of Peptide Binding to MHC class II Molecules

MHC class II molecules are central to effective adaptive immune responses. On the surface of antigen-presenting cells, they display a range of peptides for recognition by the T-cell receptors of CD4 T helper cells. In silico prediction of peptides that bind to MHC molecule from a whole antigen is extremely helpful for the identification of new antigens and pre-screening T-cell epitopes that can be used for the development of vaccines and diagnostics (for the most recent review on epitope-based vaccines see Voskens et al., 2009).

The general approach to prediction of peptide-MHC binding is based on generalizing experimental binding data to define a binding sequence pattern for a given MHC molecule. The quality of such methods is therefore highly dependent on the amount of experimental training data available and yet unsatisfactory for MHC class II binding (Lin et al., 2008; Gowthaman & Agrewala, 2008; Wang et al., 2008). Moreover, since there are thousands of different MHC alleles in the human population, and the binding data are available only for a small subset of alleles, it is desirable to develop methods that do not heavily rely on the availability of peptide-MHC binding data.

We have recently applied three structure-based approaches to prediction of peptide-MHC class II binding (submitted manuscript) and demonstrated that the performance of ab initio methods is inferior to the sequence-based methods that rely on binding data alone. However, never seem sequence-based approaches for peptide-MHC class II binding to achieve the superiority of MHC class I binding predictions. The reason is in complex nature of peptide-MHC class II interactions: MHC class II molecules bind longer peptides, and amino acids flanking the 9-mer binding core of the peptide contribute to MHC-peptide interactions and antigen processing (Godkin et al., 2001) (for review of structural aspects of peptide-MHC class II binding and their role in human diseases see the review of Jones et al., 2006). Therefore, the next logical step in improving prediction of peptide-MHC class II binding would be combining two approaches, structural and sequence-based.

The aim of this project is to (1) develop a novel approach at predicting peptide-MHC class II binding, elaborating our recent ab initio structure-based method on peptide-MHC binding data; (2) infer the models for the MHC class II alleles for which both structural and binding data are available; (3) generalize the approach (infer the models) to the alleles for which no structural data are available; (4) compare the models’ performance with the methods of others, including sequence-based IEDB consensus (Wang et al., 2008), NetMHCIIpan (Nielsen et al., 2008), RANKPEP (Reche et al., 2004), DistBoost (Hertz et al., 2006) and the method SIDT (Shift-Invariant Double Threading) by Zaitlen et al., 2008 that is based on pair wise potentials defined using both 3D structures of peptide-MHC complexes and binding data; and (5) if the method is successful, implement it as a web tool in the IEDB database and analysis resource.

Primary Mentor: Julia Ponomarenko

Summer/Rotation/PhD Student Project: General Topic – Immunoinformatics – From Structural Analysis of MHC class II Immunodominant Epitopes Towards their Prediction

When individuals are immunized with a protein antigen, most of the responding T cells are specific for only a few of the many potential epitopes contained in the antigen. These epitopes are called immunodominant. Currently, inability to predict immunodominant epitopes leads to significant impediments in rational vaccine design (Sette & Peters, 2007).

For CD4 T cells, the phenomenon of immunodominancy can be only explained by intracellular events involved in MHC-restricted recognition within antigen-presenting cells. That events include, first, proteolytic processing of the antigen that depends on the antigen, its 3D structure and the type of antigen presenting cell and, second, loading of the peptide on MHC (Watts, 2004; Villadangos & Ploegh, 2000). Implicit in these models is the prediction that the molecular context in which an antigenic peptide is contained will impact significantly on its immunodominancy. There are experimental data that support that hypothesis (see references in Weaver et al., 2008). Thus some studies demonstrated that immunodominant epitopes tend to cluster in limited regions of the antigen, often within solvent-exposed regions or at sites that are adjacent to protease-sensitive loops (Carmicle et al., 2007; Dai et al., 2002), while other suggested that immunodominant epitopes are preferentially associated with structurally stable regions (Landry, 2000; Melton & Landry, 2008) and kinetic stability of the peptide-MHC complex (Sant et al., 2007). At the same time, the experimental study of Weaver et al., 2008, which involved only five proteins containing eight dominant epitopes, “revealed no differences in the immunogenicity of peptides when they are introduced in different molecular contexts, and find the immunodominance of a peptide tracks with the peptide itself, rather than the site in a given protein or a protein in which it is contained.”

So far, there has not been broad analysis undertaken to study the immunodominant epitopes in context of the antigen 3D structure. As the content of the IEDB database grew significantly, such analysis became feasible: more than 500 MHC class II immunodominant epitopes can be mapped to a hundred proteins with known 3D structure and representing different structural families. We found that epitopes from bacteria and intracellular endosomal pathogens, such as Mycobacterium tuberculosis and Salmonella, were more solvent exposed than viral epitopes. This project aims to continue that research by providing (1) more detailed analysis of structural features of analyzed proteins, including B-factor, secondary structure, residue flexibility and conservancy; and (2) detailed comparison of epitope localization in proteins of different types.

If it is to be shown that localization of immunodominant epitopes is irrelevant to the 3D structure of the antigen and the epitope surrounding regions, it will suggest that the immunodominancy of a given epitope will be maintained in different types of vaccine constructs, promising for efforts seeking to incorporate known pathogen or tumor-derived epitopes into complex proteins (Sette & Fikes, 2003; Bull et al., 2007). Otherwise, revealing structural features associated with the epitope immunodominancy would suggest that the 3D structure of antigens controls the pathway of the antigen processing and epitope presentation. That might make feasible the development of a novel structure-based method for prediction of immunodominant MHC class II epitopes.

Primary Mentor: Julia Ponomarenko

Rotation/Summer/PhD Project: What Makes Some Introns’ Positions Ultra-conserved?

Appearance of spliceosomal introns is central to eukaryotic evolution; the origins and evolution of spliceosomal introns is still hotly debated and one of the most exciting topics of molecular evolution. A small subset of spliceosomal introns in eukaryotic genes exhibits strikingly high level of conservation across eukaryotic species in terms of their position and phase within the genes. These introns are particularly interesting as they are likely to be very old, dating back to or before last eukaryotic common ancestor (LECA).

We are interested in finding out what are the characteristic of genes and proteins with such ultra-conserved introns. Do these genes code for proteins that appear/expand in early eukaryotic evolution? Do these proteins have a specific subset of functions/structures? Are the splice-sites of ultra-conserved introns are different in any way from less conserved introns? Answering these questions will advance our understanding of early stages of evolution of spliceosomal introns.

We are currently working with Sm/lsm multi-gene family that contains many such ultra-conserved introns, but would like to extend this work further. A dataset with introns positions is available for 684 orthologous genes in 8 organisms (Rogozin et al. (2003) Current Biology 13;1512-1517), which we intend to use to address this problem.

Primary Mentors: Stella Veretnik & Phil Bourne

Rotation/Summer/PhD Project: Building a Meta-method for Assignment of Structural Domains in Proteins

Partitioning protein structures into domains is a prerequisite for many types of structure analysis. While many automatic methods for assigning structural domains exit, none of them is able to perform above 85% accuracy. The lack of complete success in domain assignment reflects the fundamental biological problem– the inconsistency of real biological data; thus ultimately it is impossible to construct an automatic method that will assign domains correctly 100% of the time. We have analyzed in-depth the properties and tendencies of different automatic methods for domain assignment and observed that different methods have complementary strengths and weaknesses (Veretnik et al. (2004) J Mol Biol 339:647-78; Holland et al. (2006) J Mol Biol. 361:562-90.). We are currently beginning the building of a meta-method, which will incorporate predictions of several existing methods. A consensus among methods will be sought, when no consensus exist, the method that performs best for give type of structure (based on the size of the protein and the sizes of assigned domains, the secondary structure of the protein as well as architecture and the topology of the protein) will be applied. If developed well, a meta-method will become a sorely needed tool for consistently and precisely partitioning any structure into domains..

Qualifications: good programming skills (C++ / Java/Perl) and basic understanding of protein structures.

Primary Mentors: Stella Veretnik and Phil Bourne

PhD Project: Looking for Correlation Between Protein and Gene Structure

To what extend is the current repertoire of proteins built from a combinatorial assembly of small structural units? And to what extend are the structural units encoded by the individual exons within a gene? Clear evidence exists for recent proteins in which structural domains frequently co-inside with individual exons. The situation is much more blurry for older proteins due to intron gain and loss throughout evolution which masks the original correlation (if there was one).

We would like to approach this differently by looking at the smaller structural units and investigate whether there is correlation between individual subdomains (or motifs) and individual exons. Furthermore, we would like to look at structural units with internal symmetry/pseudo-symmetry and the correlation between symmetry unit and exons.

Analysis will be performed separately for proteins of different “ages”, giving a sense as to how ubiquitous exon shuffling was throughout eukaryotic evolution.

Qualifications: good programming skills in C++ , experience with databases, some knowledge of structural biology.

Primary Mentors: Stella Veretnik and Phil Bourne

Rotation/Summer Project: General Topic – Scholarly Communication

Our laboratory is a strong supporter of open access (OA) to the scientific literature. OA affords opportunities to disseminate and comprehend science in news ways. We are working in two areas using OA content. First we are integrating OA literature with database content using PubMedCentral and the PDB as our test case (http://biolit.ucsd.edu). Second we are integrating video content with OA content as part of a video delivery site (http://www.scivee.tv) . See .L. Fink and P.E.Bourne 2007 Reinventing Scholarly Communication for the Electronic Age. CT Watch, 3(3) 26-31 [HTML] for further details. Projects are available in areas of programming and video production.

Primary Mentors: Lynn Fink and Phil Bourne

Undergraduate student project: Visualization of Life Sciences data on a Symbian Smart Phone

Cell phones, in particular so-called Smart Phones that have internet access, have become an important pathway for information retrieval. This kind use of Smart phones is likely to increase as their capabilities advance and cellular services become cheaper.

We will program a Java-based Life Sciences data access and visualization application on a state-of-the-art Symbian Smart Phone device that will leverage the visualization and other hardware capabilities of the Nokia N95 cell phone. These capabilities include hardware 3D acceleration, support for Java development and significant data storage capacity.

Qualifications: Good Java programming skills and strong self-motivation; ability to quickly get started using the challenging Symbian and embedded Java development environment. Life Sciences background a plus.

Primary mentor: Phil Bourne & Greg Quinn

Summer/Rotation/PhD Project – Evolution of Domain Associations in tRNA synthetases

The tRNA synthetases arose early in evolution [1], being essential for establishing the genetic code that relates nucleotide triplets to specific amino acids. Perhaps because of their presence from the beginning, the synthetases have always been available for adaptation and recruitment to emerging cell signaling pathways. Insertions of new motifs, the carving out of novel fragments by alternative splicing or proteolysis, and posttranslational modifications were all possible and provided a way to link translation to cell signaling pathways and biological networks. Viewed from this perspective, the synthetases may have been among the earliest cytokines and cell signaling molecules.

Here we wish to focus on the many domain associated with the tRNA synthetases using a technique developed in out laboratory for mapping domain associations onto a species tree [2]. In this way we can map specific domain associations in an effort to better understand the functional and disease implications of these associations, many of which are currently unknown.

[1] L. Ribas de Pouplana & P. Schimmel 2001 TIBS 26(10) 591-596.
[2] S. Yang & P.E. Bourne 2008 PLoS Comp. Biol. Submitted.

Primary Mentors: Cheryl Quinn (aTYR Pharma) & Phil Bourne