Rotation/PhD Projects

Our laboratory is addressing a number of fundamental questions in biology and scholarly communication through the use of structural bioinformatics, systems biology and information technologies, respectively. What follows are some of those questions a general overview of our approach to answering them and then specific projects which are updated periodically. Do not hesitate to contact us for further details:

Question 1: What Really Happens When we Take a Drug?
Question 2: What Does Protein Structure Tell Us About Evolution?
Question 3: Can We Discover New Biology Using Protein Structure?
Question 4: What Can Structural Bioinformatics Contribute to Immunology and Mental Disorders?
Question 5: Can we Improve the Way Science is Communicated?

Question 1: What Really Happens When we Take a Drug?

We have developed methods for characterizing and comparing, on a proteome wide scale, the 3D characteristics of ligand binding sites. This has profound implications in drug discovery since it support the notion of polypharmacology. That is, a drug or lead compound is not binding to a single receptor, but to multiple receptors and what we observe physiologically is a collective effect of those interactions. We explore that collective effect using techniques drawn from structural bioinformatics and systems biology. Thus far we have been able to deduce why side effects occur for some drugs, suggest how drugs can be repositioned to treat different diseases, suggest how leads can be further optimized, and how we might combat drug resistence in pathogens associated with neglected diseases. A summary is given here. Example publications are:

  • Review: L. Xie, L. Xie, S.L. Kinnings and P.E. Bourne 2012 Novel Computational Approaches to Polypharmacology as a Means to Define Responses to Individual Drugs, Annual Review of Pharmacology and Toxicology 52: 361-379.
  • L. Xie and P.E. Bourne 2007 A Robust and Efficient Algorithm for the Shape Description of Protein Structures and Its Application in Predicting Ligand Binding Sites BMC Bioinformatics, 8(Suppl 4):S9.
  • L. Xie and P.E. Bourne 2008 Detecting Evolutionary Linkages Across Fold and Functional Space with Sequence Order Independent Profile-profile Alignments. PNAS, 105(14) 5441-5446.
  • L. Xie, J. Wang and P.E.Bourne 2007 In Silico Elucidation of the Molecular Mechanism Defining the Adverse Effect of Selective Esterogen Receptor Modulators. PLoS Comp. Biol., 3(11) e217.
  • S.L. Kinnings, L. Xie, K.H.Fung, R.M. Jackson, L. Xie and P.E. Bourne 2010 The Mycbacterium tuberculosis Drugome and its Pharmaceutical Implications. PLoS Comp. Biol. 6(11): e1000976.
  • L. Xie, J. Li, L. Xie, and P.E.Bourne 2009 Drug Discovery Using Chemical Systems Biology: Identification of the Protein-Ligand Binding Network To Explain the Side Effects of CETP Inhibitors, PLoS Comp. Biol.. 5(5) e1000387.
  • R.L Chang, L. Xie, L. Xie, P.E. Bourne B.O. Palsson 2010 Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model. Plos Comp. Biol. 6(9): e1000938

    Primary Mentors: Phil Bourne, Li Xie & Lei Xie (CUNY).

Project: Pseudomonas aeruginosa Drug Discovery

This project will apply our chemical systems biology approach to Pseudomonas aeruginosa drug discovery. P. aeruginosa infection mainly effects people with a weak immune system. We will apply SMAP and systems biology to identify potential drug targets and drug leads to reduce the virulence of P.aeruginosa. The project consists of the following main steps:

  1. Whole genome reconstruction of P.aerugionosa metabolic pathways.
  2. Prediction of the genome-wide protein ligand interaction network using SMAP and protein-ligand docking (the computation is ongoing).
  3. Using the COBRA systems biology toolbox to simulate drug effects on P.aerugionosa survival.

More information on P. aeruginosa PA01 and its whole genome is available at The prediction will be experimentally validated by Dr. Brinkmans group at Simon Fraser University.

Project: Structure and Ligand Binding Specificity of Human Organic Anion Transporters (OATs)

Human OATs interact with urate and acidic metabolites and a large number of drugs. Their structures and ligand binding specificities remain unknown. This project will integrate our structure modeling capability with experimental data provided by Dr. Nigams lab at UCSD to iteratively model the ligand binding site of the OAT. The project includes:

  1. Develop homology models of OATs, with special attention to the ligand binding site by incorporating QSAR data.
  2. Determine ligand binding propensities by applying SMAP to the OAT models. The model will be further refined using high-throughput screening from Nigams lab.

The following references provide more information on OATs
  1. Ahsan N. Rizwan, Gerhard Burckhardt , Organic Anion Transporters of the SLC22 Family: Biopharmaceutical, Physiological, and Pathological Roles, Pharmaceutical Research, 24(3), p450.
  2. David M. Truong, Gregory Kaler, Akash Khandelwal, Peter W. Swaan, and Sanjay K. Nigam, Multi-level Analysis of Organic Anion Transporters 1, 3, and 6 Reveals Major Differences in Structural Determinants of Antiviral Discrimination, J. Biol. Chem. . 283(13), pp. 8654–8663

Primary Mentors Sanjay Nigam, Lei Xie and Phil Bourne

Project: Bacterial RNA Polymerase (RNAP)

Bacterial RNA polymerase (RNAP) is a proven target for broad-spectrum antibacterial therapy. Unfortunately, the first-line anti-tuberculosis therapy rifamycin has a high propensity for generating resistant mutations in the rifamycin binding site. Recently, an alternative binding site in the switch region of RNAP has been identified and it has been suggested as an exceptionally attractive target for discovery of new broad-spectrum antibacterial therapeutic agents since this region is distant from the binding site for rifamycin and from the binding sites for other characterized inhibitors of bacterial RNA. We ran SMAP on this region against the druggable proteins and human proteins in order to identify some existing drugs which could interact with this switch region. We followed a similar procedure as we have done on the CETP and found most of the top hits from SMAP are estrogen receptors. Electrostatic potential analysis, structure analysis and docking results showed that these off-targets could be very promising.

Without experimental validation we think a solid binding free energy calculation may help us to determine the binding affinity between the existing estrogen receptor inhibitors and RNAP. If the binding can be confirmed, we will have some confidence to say estrogen receptor inhibitors could be used as a potential drug for TB. There are basically three levels to calculate the binding affinity. Docking is the fast but the lowest level. The second level is MM/PBSA/GBSA and linear interaction energy method. And the most accurate methods are thermodynamic integration (TI) and free energy perturbation (FEP). We are considering the most accurate and the most time-consuming one, like thermodynamic integration. To do this, several MD simulations should be performed step by step from the free-state to the bound state.

Question 2: What Does Protein Structure Tell Us About Evolution?

Protein structure is more conserved than protein sequence and thus in principle can be used to look at longer evolutionary distances. Structure also reveals information of metal binding, disulphide linkage, co-factor binding, protein-protein interaction etc. not revealed from sequence. We have exploited increased proteome coverage by structure in extant organisms to ask fundamental questions in evolutionary biology, for example:

  • How has life been impacted by fundamental shifts in geochemistry over the earth's history?
  • Does the tree of life remain a maningful concept and if so where does the root lie?

Example publications:

  • R.E. Valas & P.E. Bourne 2010 Save the Tree of Life or Get Lost in the Woods Biology Direct, 5:44.
  • C. L. Dupont, A Butcher, R. Valas, P.E. Bourne, and G. Caetano-Anolles 2010 The Impact of Trace Metal Chemistry on the Evolution of Life PNAS, doi: 10.1073/pnas.0912491107.
  • R.E. Valas & P.E. Bourne 2009 Structural Analysis of Polarizing Indels: An Emerging Consensus on the Root of the Tree of Life. Biology Direct. 4(1) 30.

Project: Methods for Finding Protein Sectors

The hierarchical organization of primary, secondary, tertiary, and quaternary structure has been a fundamental framework used in structural bioinformatics research, but a recent paper by Halabi et al (Cell 138, 774-786, 2009) suggests that functionally distinct units, called protein sectors, defy this well-accepted organization principle. Furthermore, the protein sectors, in several examples of small domains from large sequence families, are shown to diverge independently in evolutio. Tthey are identified by a statistical analysis of correlated evolution in amino acids. If the sectors theory proves to be general, it could usher in a paradigm shift in structural bioinformatics research. It should be interesting and important to develop methods to test this theory independently. For example, since these sectors are thought to involve in allosteric mechanisms and protein folding, do within-sector residues exhibit correlated dynamics distinct from that of a different sector? Can data from molecular dynamics simulations (atomic, or better yet, coarse-grained) be used to find motion-correlated residues? And will these residues correspond to functional (and evolutionary) sectors? On a further thought, if one wishes to decompose protein structure disregarding the traditional hierarchy, what would be a reasonable way of doing it?

Mentors: Ming-Jing Hwang & Phil Bourne

Project: Evolution of Metal Utilization and Metalloproteins

Recent work by us and others has started to utilize complete genomes as molecular fossils recording both the evolution of life but also the earth (see Cavalier-Smith 2006 Philos Trans R Soc Lond B Biol Sci. 2006 Jun 29;361(1470):969-1006). In two recent papers ( PNAS 2006 103(47) 17822-17827; PNAS 2010 we have explored the impact of geochemical changes over the past 4 billion years on the proteomes of modern day organisms and provide the first evidence for a strong correlation. Only through the disposition of protein structures known to bind metals across the three superkingdoms of life was this analysis possible. One of the striking trends revealed during these studies was an early evolution of protein structures that bind multiple metals, whereas those that bind only one metal are relatively young.  An emergent hypothesis contends that the metal binding sites in ancient proteins contain mixed hard and soft ligands with long bond lengths, whereas more modern proteins will involve a more intimate relationship between the protein backbone and inorganic ion.

Mentors: Phil Bourne and Chris Dupont (J. Craig Venter Institute)

Project: Exploring the Flexibility versus Designability of Protein Folds

Nature consists of a very limited number of protein folds - of the order of 1400 are known today. Each protein fold accommodates one or more families of proteins. The more families that a fold accommodates the more designable it is said to be. Recently we have developed a method (Gu et al. 2006 PLoS Comp. Biol. 2(7) e90) which measures the flexibility of a protein from only its amino acid sequence. The approach uses the Gaussian Network Model and a normalization procedure to measure relative flexibility at each amino acid position. Training a support vector machine on sequences taken from known structures from which flexibility can be experimentally verified a generalized method of determining flexibility from sequence is obtained.

Hypothesis - Flexibility is correlated with designability. That is the more designable a protein the more flexible it will be. Certainly this is intuitively what one would expect. The more flexible the protein the more sequences it can be expected to accommodate.

Part of the project will be to develop the appropriate experimental protocol to test this hypothesis, but the basic intent is to use the existing methodology (and code) to determine flexibility of a range of protein sequences which fall into given folds of varying designability. The latter can be taken directly from the SCOP database.

Primary Mentor: Andreas Prlic and Phil Bourne

Project: What Makes Some Intron Positions Ultra-conserved?

Appearance of spliceosomal introns is central to eukaryotic evolution; the origins and evolution of spliceosomal introns is still hotly debated and one of the most exciting topics of molecular evolution. A small subset of spliceosomal introns in eukaryotic genes exhibits strikingly high level of conservation across eukaryotic species in terms of their position and phase within the genes. These introns are particularly interesting as they are likely to be very old, dating back to or before last eukaryotic common ancestor (LECA).

We are interested in finding out what are the characteristic of genes and proteins with such ultra-conserved introns. Do these genes code for proteins that appear/expand in early eukaryotic evolution? Do these proteins have a specific subset of functions/structures? Are the splice-sites of ultra-conserved introns are different in any way from less conserved introns? Answering these questions will advance our understanding of early stages of evolution of spliceosomal introns.

We are currently working with Sm/lsm multi-gene family that contains many such ultra-conserved introns, but would like to extend this work further. A dataset with introns positions is available for 684 orthologous genes in 8 organisms (Rogozin et al. (2003) Current Biology 13;1512-1517), which we intend to use to address this problem.

Primary Mentors: Stella Veretnik & Phil Bourne

Project: Looking for Correlation Between Protein and Gene Structure

To what extend is the current repertoire of proteins built from a combinatorial assembly of small structural units? And to what extend are the structural units encoded by the individual exons within a gene? Clear evidence exists for recent proteins in which structural domains frequently co-inside with individual exons. The situation is much more blurry for older proteins due to intron gain and loss throughout evolution which masks the original correlation (if there was one).

We would like to approach this differently by looking at the smaller structural units and investigate whether there is correlation between individual subdomains (or motifs) and individual exons. Furthermore, we would like to look at structural units with internal symmetry/pseudo-symmetry and the correlation between symmetry unit and exons.

Analysis will be performed separately for proteins of different “ages”, giving a sense as to how ubiquitous exon shuffling was throughout eukaryotic evolution.

Qualifications: good programming skills in C++ , experience with databases, some knowledge of structural biology.

Primary Mentors: Stella Veretnik and Phil Bourne

Question 3: Can We Discover New Biology Using Protein Structure?

Project:: Symmetry in Quaternary Structure

Approximately one-half of the structures in the PDB are oligomers. Defining exactly the symmetry relationships between the monomeric units is not always trivial yet important for understanding function and evolutionary principles. The goal is to extend the CE algorithm developed in our laboratory to perform structure alignment of multimeric units. If there are n monomeric units their are n! combinations to be considered.

Primary Mentors: Peter Rose & Phil Bourne

Project: Understanding and Predicting Epistasis

One of unsolved problems in genetic variances is to understand and predict epistasis (synergistic or antagonistic gene-gene interaction). This problem has been studied using protein-protein interaction networsk and metabolic networks, but not yet using structure information.We hypothesize that epistasis may be related to protein ligand binding propensity underlying protein functions. Because a large amount of both positive and negative data for epistasis are available now, it is time to study epistasis using structural relationships with tools like SMAP and other structural analysis tools from our laboratory and others.

Primary Mentors Lei Xie and Phil Bourne

Project: Investigating the Effect of Alternative Splicing on Protein Structure and Disorder

Alternative splicing (AS) has considerable potential to expand the cellular protein repertoire. The extent to which this potential is realized is difficult to estimate given limited experimental knowledge about the alternative isoforms coding for functional proteins. An important biological question is how alternatively spliced isoforms differing in coding sequence can avoid catastrophic disruption of three-dimensional (3D) protein structure? This question has been partially investigated in several publications (PMIDs: 12615003; 12907725; 15103394) with the conclusion that AS events tend to occur outside of annotated protein domains, and that alternative splicing usually removes entire protein domains as opposed to only the parts of domains.  Furthermore, using a limited number of available examples, Romero et al (PMID: 16717195) have shown that protein regions/domains affected by alternative splicing are frequently disordered/unstructured.

We are currently sequencing a large number of novel alternatively spliced (AS) isoforms of brain-expressed autism and schizophrenia candidate genes using 454 GS FLX technology. The impact of AS on protein structure/disorder could be further investigated using this new dataset of AS isoforms. Mapping new isoforms to genome/proteome in order to understand the impact of AS on proteins is the main goal of this rotation project. The steps that would have to be completed are: (1) align sequenced transcripts to reference sequences and to known isoforms in order to annotate them in terms of novelty; (2) identify coding regions (i.e. exons) of the transcripts that are lost/gained as a result of AS; (3) structurally annotate isoforms in terms of protein domains and disordered regions; (4) map full-length proteins and their corresponding isoforms to PDB to define structural elements that are lost/gained as a result of AS. All sequenced AS isoforms of autism and schizophrenia candidate genes are currently put through the high-throughput Y2H pipeline (in collaboration with Marc Vidal group at Harvard) to collect protein-protein interaction data. The final goal of this project is to compare interactomes of the full-length proteins and their AS isoforms to understand how AS influences protein-protein interactions and PPI networks.

Primary Mentors: Lilia Iakoucheva and Phil Bourne

SNPs and Protein Structure

Analysing the 400+ SNPs for which we have 3D structure coordinates in PDB and compare them with wildtype proteins. Identify what is
common/different from the WT ones and come up with conclusions for what we can learn from them....

Primary Mentors: Andreas Prlic and Phil Bourne

Question 4: What Can Structural Bioinformatics Contribute to Immunology and Mental Disorders?

Project: Structure-based in silico Profiling of the Functional Effect of Mutations Implicated in Autism and Schizophrenia

This project aims at investigating the molecular basis of two psychiatric diseases, autism spectrum disorders (ASD) and schizophrenia (SCZ). We are collecting our own yeast-two-hybrid data and building protein-protein interaction networks centered on the disease-relevant genes. Our goal is to identify molecular pathways/modules that play role in the pathogenesis of these diseases. We are currently starting a new project that focuses on investigating how de novo mutations (DNMs) such as missense single nucleotide variants and insertions/deletions, identified through exome sequencing efforts in patients perturb/disrupt the networks.

We have assembled a list of the DNMs that have been recently implicated in autism or schizophrenia with the objective of investigating their impact on protein-protein interactions. We are especially interested in the mutations that are predicted to disrupt interactions, such as those mapped to protein complexes interfaces. Disruption of interactions due to mutations could have profound influence on the network topology. The advantage of having disease-relevant networks already assembled in the lab puts us in the leading position for investigating this question. Once the candidate mutations for network perturbation are identified using in silico structure-based profiling, we will introduce them by site-directed mutagenesis into the wild-type protein sequences and then examine the interaction defects of the mutant proteins by using yeast-two-hybrid screens.

Primary Mentors: Lilia Iakoucheva and Phil Bourne

Project: Prediction of Peptide Binding to MHC class II Molecules

MHC class II molecules are central to effective adaptive immune responses. On the surface of antigen-presenting cells, they display a range of peptides for recognition by the T-cell receptors of CD4 T helper cells. In silico prediction of peptides that bind to MHC molecule from a whole antigen is extremely helpful for the identification of new antigens and pre-screening T-cell epitopes that can be used for the development of vaccines and diagnostics (for the most recent review on epitope-based vaccines see Voskens et al., 2009).

The general approach to prediction of peptide-MHC binding is based on generalizing experimental binding data to define a binding sequence pattern for a given MHC molecule. The quality of such methods is therefore highly dependent on the amount of experimental training data available and yet unsatisfactory for MHC class II binding (Lin et al., 2008; Gowthaman & Agrewala, 2008; Wang et al., 2008). Moreover, since there are thousands of different MHC alleles in the human population, and the binding data are available only for a small subset of alleles, it is desirable to develop methods that do not heavily rely on the availability of peptide-MHC binding data.

We have recently applied three structure-based approaches to prediction of peptide-MHC class II binding (submitted manuscript) and demonstrated that the performance of ab initio methods is inferior to the sequence-based methods that rely on binding data alone. However, never seem sequence-based approaches for peptide-MHC class II binding to achieve the superiority of MHC class I binding predictions. The reason is in complex nature of peptide-MHC class II interactions: MHC class II molecules bind longer peptides, and amino acids flanking the 9-mer binding core of the peptide contribute to MHC-peptide interactions and antigen processing (Godkin et al., 2001) (for review of structural aspects of peptide-MHC class II binding and their role in human diseases see the review of Jones et al., 2006). Therefore, the next logical step in improving prediction of peptide-MHC class II binding would be combining two approaches, structural and sequence-based.

The aim of this project is to (1) develop a novel approach at predicting peptide-MHC class II binding, elaborating our recent ab initio structure-based method on peptide-MHC binding data; (2) infer the models for the MHC class II alleles for which both structural and binding data are available; (3) generalize the approach (infer the models) to the alleles for which no structural data are available; (4) compare the models’ performance with the methods of others, including sequence-based IEDB consensus (Wang et al., 2008), NetMHCIIpan (Nielsen et al., 2008), RANKPEP (Reche et al., 2004), DistBoost (Hertz et al., 2006) and the method SIDT (Shift-Invariant Double Threading) by Zaitlen et al., 2008 that is based on pair wise potentials defined using both 3D structures of peptide-MHC complexes and binding data; and (5) if the method is successful, implement it as a web tool in the IEDB database and analysis resource.

Primary Mentor: Julia Ponomarenko

Project: From Structural Analysis of MHC class II Immunodominant Epitopes Towards their Prediction

When individuals are immunized with a protein antigen, most of the responding T cells are specific for only a few of the many potential epitopes contained in the antigen. These epitopes are called immunodominant. Currently, inability to predict immunodominant epitopes leads to significant impediments in rational vaccine design (Sette & Peters, 2007).

For CD4 T cells, the phenomenon of immunodominancy can be only explained by intracellular events involved in MHC-restricted recognition within antigen-presenting cells. That events include, first, proteolytic processing of the antigen that depends on the antigen, its 3D structure and the type of antigen presenting cell and, second, loading of the peptide on MHC (Watts, 2004; Villadangos & Ploegh, 2000). Implicit in these models is the prediction that the molecular context in which an antigenic peptide is contained will impact significantly on its immunodominancy. There are experimental data that support that hypothesis (see references in Weaver et al., 2008). Thus some studies demonstrated that immunodominant epitopes tend to cluster in limited regions of the antigen, often within solvent-exposed regions or at sites that are adjacent to protease-sensitive loops (Carmicle et al., 2007; Dai et al., 2002), while other suggested that immunodominant epitopes are preferentially associated with structurally stable regions (Landry, 2000; Melton & Landry, 2008) and kinetic stability of the peptide-MHC complex (Sant et al., 2007). At the same time, the experimental study of Weaver et al., 2008, which involved only five proteins containing eight dominant epitopes, “revealed no differences in the immunogenicity of peptides when they are introduced in different molecular contexts, and find the immunodominance of a peptide tracks with the peptide itself, rather than the site in a given protein or a protein in which it is contained.”

So far, there has not been broad analysis undertaken to study the immunodominant epitopes in context of the antigen 3D structure. As the content of the IEDB database grew significantly, such analysis became feasible: more than 500 MHC class II immunodominant epitopes can be mapped to a hundred proteins with known 3D structure and representing different structural families. We found that epitopes from bacteria and intracellular endosomal pathogens, such as Mycobacterium tuberculosis and Salmonella, were more solvent exposed than viral epitopes. This project aims to continue that research by providing (1) more detailed analysis of structural features of analyzed proteins, including B-factor, secondary structure, residue flexibility and conservancy; and (2) detailed comparison of epitope localization in proteins of different types.

If it is to be shown that localization of immunodominant epitopes is irrelevant to the 3D structure of the antigen and the epitope surrounding regions, it will suggest that the immunodominancy of a given epitope will be maintained in different types of vaccine constructs, promising for efforts seeking to incorporate known pathogen or tumor-derived epitopes into complex proteins (Sette & Fikes, 2003; Bull et al., 2007). Otherwise, revealing structural features associated with the epitope immunodominancy would suggest that the 3D structure of antigens controls the pathway of the antigen processing and epitope presentation. That might make feasible the development of a novel structure-based method for prediction of immunodominant MHC class II epitopes.

Primary Mentor: Julia Ponomarenko

Question 5: Can we Improve the Way Science is Communicated?

Our laboratory is a strong supporter of open access (OA) to the scientific literature. OA affords opportunities to disseminate and comprehend science in news ways. We are working in two areas using OA content. First we are integrating OA literature with database content using PubMedCentral and the PDB as our test case ( Second we are integrating video content with OA content as part of a video delivery site ( . See .L. Fink and P.E.Bourne 2007 Reinventing Scholarly Communication for the Electronic Age. CT Watch, 3(3) 26-31 [HTML] for further details. Projects are available in areas of semantic tagging, video content searching and literature database integration.

Primary Mentor: Phil Bourne