Pharm 207/Bio 207 Home Page

Using Internet Resources in Molecular Biology - Lecture 7

Protein Physico-Chemical

Data Analyses

Lecturer: Christopher M. Smith
Date & Time: 3-5 11/06/01

Table of Contents

 


Lecture Outline

  • Tools for the Analysis of Empirical Physico-Chemical Data
  • Theoretical Calculations of Physico-Chemical Characteristics
  • Compendiums of web-accessible Analytical Tools & Resources

  • | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    Introduction

    Much of present day protein biotechnology focuses on protein analyses from the stand-point of its sequence, in most cases generated by some manifestation of one cloning technique or another. Despite the major information inroads molecular biology may provide in protein research, at some point a "real" protein will need to be purified and characterized. Even in pure protein research, an inordinate amount of emphasis is put on purifying a protein to homogeneity to ascertain identification via peptide sequencing.

    In many of these instances, there is a tendency to overlook the wealth of information that can be derived from experimental data collected as one purifies and characterizes the protein. Sometimes a preliminary identification may be made based simply on empirical physical and chemical data. Preliminary identifications are important because they can aid the purification process by providing some initial indication that the researcher is on the "right" or "wrong" track toward purifying their protein of interest.

    In this lab-lecture we will experiment with a few computational tools that can make a "putative" identification of an unknown protein based on various physical and chemical data that is typically generated while a protein is being purified and characterized. With these tools a putative identification of a unknown protein can be made without any knowledge of its sequence.


    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    Reading Materials

    1. Hobohm U, Houthaeve T, Sander C (1994) Amino acid analysis and protein database composition search as a fast and inexpensive method to identify proteins. Anal Biochem 222:202.
    2. Hobohm U, Sander C (1995) A sequence property approach to searching protein databases. J Mol Biol 251:390.
    3. Pappin DJC, Hojrup P, Bleasby AJ (1993) Rapid Identification of Proteins by Peptide-Mass Fingerprinting. Current Biology 3:327.
    4. Smith CM (1997) The CMS Molecular Biology Resource. Trends in Genetics 13:416.
    5. Wilkins MR, Ou K, Appel RD, Sanchez JC, Yan JX, Golaz O, Farnsworth V, Cartier P, Hochstrasser DF, Williams KL, Gooley AA (1996) Rapid protein identification using N-terminal "sequence tag" and amino acid analysis. Biochem Biophys Res Commun 221:609.


    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    WEB Information Resources:


    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |

    Goals

    To expose you to the variety of web-based analytical tools that can be used to make putative identification of unknown proteins based upon physico-chemical data generated during the purification and/or characterization of these proteins. To provide opportunities to experiment with and gain experience with these analytical tools.

    Lecture Assignment

    In this lecture section of Pharm207/Bio207, you will determine the identity of an unknown protein and create a physico-chemical features profile (mass, pI, aa composition, peptide fragment map, etc.) for your class pet protein. We will emphasize the use of four web-based tools for these exercises, but there are many more tools that can perform the same or similar functions. They are listed in the compendiums cited in the "Web Information Resources" section (above).


    Identify an Unknown Protein based upon Physico-Chemical Data

    For this initial exercise (walk through of the process), the whole class will analyze the same protein (class example). Subsequently each of you will get an unknown that you'll need to analyze separately. Your unknown is HERE.

    Your Data:

    You are having a nightmare and in your nightmare you are a plant biochemist ... yes! a plant biochemist ! You are attempting to purify what you think is a novel protein from maize (Zea mays, "corn" for the uninitiated) . It takes you roughly 8 weeks to purify 100 ng of the protein from 1500 gm of fresh leaf tissue (about a dozen 6 foot tall, mature plants). The protein is located in the mesophyll cells and appears to be active in tissue harvested during the day, but inactive in tissue harvested at night. You have ascertained in in vivo studies that your protein is phosphorylated and dephosphorylated in night and day tissues, respectively. You are very excited about the regulatory implications of your findings and are extremely anxious to determine the identity of your protein.

    Unfortunately,

    1. you have not purified enough of the protein for further (sequencing) analysis,
    2. you dropped the last, of your current sample onto the floor,
    3. your sequencing facility lost your sample and/or,
    4. your sequencing facility has a back-log of work and will not get to your blot for a few weeks.
    And your research advisor wants to know (yesterday!) what the protein is so he can send your manuscript off to Cell, since you are in a very close race with another lab to identify the protein.

    UTILIZE ! the enormous pool of physico-chemical data acquired during purification and characterization of your protein (assuming that you are a GOOD biochemist and have done these things!). Some of the things you have determined are:

    The data for your protein, which you've named Wonderland is given below:

    The selected enzyme is: Trypsin
    
    Maximum number of missed cleavages (MC): 0
    
    All cysteines in reduced form.
    
    Methionines have not been oxidized.
    
    Displaying peptides with a mass bigger than 500 Dalton.
    
    Using monoisotopic masses of the occurring amino acid residues and giving
    peptide masses as [M+H]+.
    
    ----------------------------------------------------------------------------
    
    The peptide masses from your protein are:
    
          4231.16   
          3739.72   
          3673.84   
          2742.24  
          2395.20   
          2307.11    
          2280.16   
          2224.18    
          2139.99   
          2037.99    
          1879.00    
          1876.97   
          1816.82    
          1799.98    
          1781.79    
          1754.85    
          1734.80    
          1726.91    
          1655.86    
          1605.86  
          1515.81   
          1471.84   
          1469.69    
          1364.75  
          1333.74   
          1313.60   
          1301.71  
          1290.71   
          1281.72    
          1242.67    
          1206.68    
          1171.54    
          1166.55    
          1162.60   
          1133.58    
          1130.55   
          1117.55    
          1113.57   
          1099.56    
          1098.55   
          1085.61   
          1039.52    
          1022.64   
          1022.51   
          1004.52   
           987.53  
           973.53   
           919.43   
           900.37   
           861.44   
           860.43   
           834.45   
           831.50   
           815.44   
           803.38   
           801.45   
           793.50   
           789.43   
           754.38  
           751.42   
           749.33   
           720.36  
           692.31     
           688.39    
           669.31   
           646.38   
           634.28   
           627.38   
           616.37  
           605.31   
           593.37  
           588.37   
           585.37   
           561.29   
           558.26   
           546.30  
           517.31   
    
    

    Analyzing your Data:

    For these analyses, you'll want to retain this browser window and actually work from another. To open another browser window, hit the keyboard combination "Alt" "N" or select the "New Web Browser" item from the "File" pull-down window (upper left-hand corner of this browser window).

    Putative Unknown Protein ID

    Compare your post-results analysis for all three tests. Based on these, what is your best guesstimate as to the identification of your unknown protein ? Email your answer. Be sure to include "Pharm207" in the subject line of your email. Please keep a copy of a ALL your analyses to local disk. I may ask to discuss your answer and/or want to review your analysis results.

    And the Answer Is (Class Example) ?


    Physico-Chemical Features Profile of your "pet protein"

    In this exercise, I want you to create a features profile (similar to that of your individual unknowns) of the protein for which you are creating a web page for the course. To create this profile, you will need the protein sequence (in single letter aa code) of your pet protein.


    | TOC & Lecture Outline | Introduction | Reading Materials | Lecture Goals and Assignment | Examples |