 |
Pharm 207/Bio 207 Home Page
Using Internet Resources in Molecular
Biology - Lecture 7
Protein Physico-Chemical
Data Analyses
|
Lecture Outline
Tools for the Analysis of Empirical Physico-Chemical Data
Theoretical Calculations of Physico-Chemical Characteristics
Compendiums of web-accessible Analytical Tools & Resources
|
|
|
Introduction
Much of present day protein biotechnology focuses on protein analyses
from the stand-point of its sequence, in most cases generated by some
manifestation of one cloning technique or another. Despite the major information
inroads molecular biology may provide in protein research, at some point
a "real" protein will need to be purified and characterized.
Even in pure protein research, an inordinate amount of emphasis is put on purifying a protein
to homogeneity to ascertain identification via peptide sequencing.
In many of these instances, there
is a tendency to overlook the wealth of information that can be derived
from experimental data collected as one purifies and characterizes the protein.
Sometimes a preliminary identification may be made based simply on empirical
physical and chemical data. Preliminary identifications are important
because they can aid the purification process by providing some initial
indication that the researcher is on the "right" or "wrong" track toward
purifying their protein of interest.
In this lab-lecture we will experiment with a few computational tools
that can make a "putative" identification of an unknown protein based on various physical and chemical
data that is typically generated while a protein is being purified and characterized.
With these tools a putative identification of a unknown protein can be
made without any knowledge of its sequence.
Reading Materials
- Hobohm U, Houthaeve T, Sander C (1994) Amino acid analysis and protein
database composition search as a fast and inexpensive method to identify
proteins. Anal Biochem 222:202.
- Hobohm U, Sander C (1995) A sequence property approach to searching
protein databases. J Mol Biol 251:390.
- Pappin DJC, Hojrup P, Bleasby AJ (1993) Rapid Identification of Proteins
by Peptide-Mass Fingerprinting. Current Biology 3:327.
- Smith CM (1997) The CMS Molecular Biology Resource. Trends in Genetics 13:416.
- Wilkins MR, Ou K, Appel RD, Sanchez JC, Yan JX, Golaz O, Farnsworth
V, Cartier P, Hochstrasser DF, Williams KL, Gooley AA (1996) Rapid protein
identification using N-terminal "sequence tag" and amino acid
analysis. Biochem Biophys Res Commun 221:609.
WEB Information Resources:
Goals
To expose you to the variety of web-based analytical tools that can
be used to make putative identification of unknown proteins based upon
physico-chemical data generated during the purification and/or characterization
of these proteins. To provide opportunities to experiment with and gain
experience with these analytical tools.
Lecture Assignment
In this lecture section of Pharm207/Bio207, you will determine the identity
of an unknown protein and create a physico-chemical features profile (mass,
pI, aa composition, peptide fragment map, etc.) for your class pet protein.
We will emphasize the use of four web-based tools for these exercises,
but there are many more tools that can perform the same or similar functions.
They are listed in the compendiums cited in the "Web Information Resources"
section (above).
Identify an Unknown Protein based upon Physico-Chemical
Data
For this initial exercise (walk through of the
process), the whole class will analyze the same protein (class example).
Subsequently each of you will get an unknown that you'll need to analyze
separately. Your unknown is HERE.
Your Data:
You are having a nightmare and in your nightmare you are a plant biochemist
... yes! a plant biochemist ! You are attempting to purify what you think
is a novel protein from maize (Zea mays, "corn" for the
uninitiated) . It takes you roughly 8 weeks to purify 100 ng of the protein
from 1500 gm of fresh leaf tissue (about a dozen 6 foot tall, mature plants). The protein is located in the mesophyll
cells and appears to be active in tissue harvested during the day, but
inactive in tissue harvested at night. You have ascertained in in vivo
studies that your protein is phosphorylated and dephosphorylated in night
and day tissues, respectively. You are very excited about the regulatory
implications of your findings and are extremely anxious to determine
the identity of your protein.
Unfortunately,
- you have not purified enough
of the protein for further (sequencing) analysis,
- you dropped the last,
of your current sample onto the floor,
- your sequencing facility lost
your sample and/or,
- your sequencing facility has a back-log of work and will not get to your blot
for a few weeks.
And your research advisor wants to know (yesterday!) what
the protein is so he can send your manuscript off to Cell, since you are in a very close race with another lab to identify
the protein.
UTILIZE ! the enormous pool of physico-chemical data acquired
during purification and characterization of your protein (assuming
that you are a GOOD biochemist and have done these things!). Some
of the things you have determined are:
- Molecular Mass (Mr)
- Isoelectric Point (pI)
- Amino Acid Composition (total no. aa, relative and absolute no. of
each aa)
- Peptide Mass Data from a Tryptic Digest
The data for your protein, which you've named Wonderland is given
below:
- Molecular Mass: 111 kD (+/- 5kD) [determined by high resolution gel filtration
chromatography, and Native & Denaturing PAGE analysis]
- Isoelectric Point: 5.87 (+/-0.25) [determined by chromatofocusing on
a Pharmacia MonoP column]
- Amino Acid Composition [total protein hydrolysis and product analysis
by high resolution HPLC/CE]
Ala (A) 68 7,03 %
Arg (R) 73 7,55 %
Asn (N) 31 3,21 %
Asp (D) 65 6,72 %
Cys (C) 8 0,83 %
Gln (Q) 40 4,14 %
Glu (E) 80 8,27 %
Gly (G) 49 5,07 %
His (H) 22 2,28 %
Ile (I) 48 4,96 %
Leu (L) 110 11,38 %
Lys (K) 58 6,00 %
Met (M) 26 2,69 %
Phe (F) 40 4,14 %
Pro (P) 49 5,07 %
Ser (S) 55 5,69 %
Thr (T) 50 5,17 %
Trp (W) 14 1,45 %
Tyr (Y) 28 2,90 %
Val (V) 53 5,48 %
Peptide Mass Data [partial tryptic digestion and product analysis by
high-resolution HPLC]
The selected enzyme is: Trypsin
Maximum number of missed cleavages (MC): 0
All cysteines in reduced form.
Methionines have not been oxidized.
Displaying peptides with a mass bigger than 500 Dalton.
Using monoisotopic masses of the occurring amino acid residues and giving
peptide masses as [M+H]+.
----------------------------------------------------------------------------
The peptide masses from your protein are:
4231.16
3739.72
3673.84
2742.24
2395.20
2307.11
2280.16
2224.18
2139.99
2037.99
1879.00
1876.97
1816.82
1799.98
1781.79
1754.85
1734.80
1726.91
1655.86
1605.86
1515.81
1471.84
1469.69
1364.75
1333.74
1313.60
1301.71
1290.71
1281.72
1242.67
1206.68
1171.54
1166.55
1162.60
1133.58
1130.55
1117.55
1113.57
1099.56
1098.55
1085.61
1039.52
1022.64
1022.51
1004.52
987.53
973.53
919.43
900.37
861.44
860.43
834.45
831.50
815.44
803.38
801.45
793.50
789.43
754.38
751.42
749.33
720.36
692.31
688.39
669.31
646.38
634.28
627.38
616.37
605.31
593.37
588.37
585.37
561.29
558.26
546.30
517.31
Analyzing your Data:
- For these analyses, you'll want to retain this
browser window and actually work from another. To open another browser
window, hit the keyboard combination "Alt" "N" or select
the "New Web Browser" item from the "File" pull-down
window (upper left-hand corner of this browser window).
- Putative identity based upon Molecular Mass
(Mr) and Isoelectric Point (pI). To perform this analysis, we will
use the TagIdent
analytical tool at ExPASy in Switzerland.
TagIdent compares the Mr and pI combination of your unknown protein against
a library of calculated Mr and pI's for known proteins in the SWISS-PROT
database. It returns a list of SWISS-PROT protein entries that have Mr
and pI's that are similar to that of your unknown protein. This analysis
is ONLY based upon two physico-chemical criteria, therefore some of the
results can deceive you if you do not take the time to thoroughly analyze
them. For example, if you are working on a plant protein that catalyzes
reaction X, and at the top of the results list is a protein from an anaerobic
microorganism that catalyzes reaction G, you would usually discount this
and like entries.
Although this seems like a reasonable thing to do, it also has it's disadvantages which could be lead you down the
wrong road. In some cases you could unknowingly
be copurifying two proteins of similar Mr and pI, but that catalyze different
reactions (c.f., MDH and PPDK-Regulatory Protein from maize chloroplast
stroma). Cases like these are likely to become more prevalent as we try
to purify low abundant regulatory proteins (phosphatases, protein kinases,
etc.), proteins that can be easily "masked" by other relatively
high-abundant proteins. Do not discount the obvious, and always use the
results of several different analyses to make decisions.
- In the TagIdent web form page, input the pI, and molecular mass (in Daltons, not kD, and NO commas) for your
unknown. You need not enter any values for the "OS", "keyword", or tagging
windows. Standard default values will suffice for this exercise. Do not select the "Send results via email" button.
Submit
your request.
- It may take anywhere between a few seconds to a few minutes to get a result. The time usually depends on the
mumber of requests being made of the server. In the meantime,
you may want to start the next analysis. Open an new navigator window to this page and start another analyses.
- Save your results as "Source", or send this file to your local email account. You could also run the analysis
with the "send to email account" selected.
- Based on these results alone what is your best guesstimate as to the
identification of your unknown ? REmember to consider all the information you have in hand, not just the numbers!
- Putative identity based upon Molecular Mass
(Mr) and Amino Acid Composition. To make this analysis, we will
use the MultiIdent
analytical tool located at ExPASY
in Switzerland. The MultiIdent server compares the relative amino acid
composition
of your unknown protein against a database of compositions that have been
extracted for each known protein in the SWISS-PROT database.
- In the MultiIdent web form page, input your email address (Your results
will be emailed to you. They will not be displayed via this web interface),
manually enter the relative (%) composition values for each amino acid
in the appropriate window. Note: It is sometimes very difficult to discriminate
between the carboxy and amino forms of the same amino acid (i.e., aspartic
acid vs asparagine), thus the values for both forms are usually summed for
compositional analyses. Therefore, you will need to add the individual
values for the amino acid sets; glutamic acid and glutamine(Glx),
and aspartic acid and asparagine (Asx), write these sums in the
appropriate windows (use "Constellation 2"). Input the molecular mass in Daltons (no comma's!),
and DO NOT input a pI. For this particular hypothetical analysis,
we are assuming that we do not have pI data available to us. Do not input
anything into the remining windows or change any other defaults. Scroll
to the bottom of the form and change the default "Number of Hits"
"100" to 50. Then submit this query by clicking on the "Perform
the Search" button.
- In a few minutes you will receive your results via email. In the meantime,
you may want to start the next analysis.
- Save your email results to local disk. In PINE, select "E"
(export mail file), enter a file name for this mail message, then export
("save").
- Based on these results alone what is your best guesstimate as to the
identification of your unknown ?
- Putative identity based upon Molecular Mass
and Peptide Fragment Masses.To make this analysis, we will use the
PeptideSearch
analytical tool located at EMBL-Heidelberg.
PeptideSearch makes a comparison between the molecular mass and peptide
masses (of the fragments generated by partial chemical or enzymic digest)
of your unknown protein against a database of peptide fragment maps calculated
(or empirically determined) for known proteins.
- In the PeptideSearch web form page, input the
mass range of your protein, ensure that the cleavage agent is selected
correctly, set the number of peptides required for a match to 10 (the default
is 5: increasing this number increases specificity), and copy and paste your peptide fragment data into the
peptide
masses window. Note: You must first use the cut command to remove
the default fragment masses in the list. The resent buttom will not clear
this list. The remaining window defaults will suffice for this exercise.
- Submit the request. A results page (web-based interface) will return
in a minute or two.
- Save your results to local disk, using the "Save As" item
in your browsers pull-down "File" window. Save as "Text"
or "Source", it's your choice.
- Based on these results alone what is your best guesstimate as to the
identification of your unknown ?
Putative Unknown Protein ID
- Compare your post-results analysis for all three
tests. Based on these, what is your best guesstimate as to the identification
of your unknown protein ? Email
your answer. Be sure to include "Pharm207"
in the subject line of your email. Please keep a copy of a ALL your analyses
to local disk. I may ask to discuss your answer and/or want to review your
analysis results.
- And
the Answer Is (Class Example) ?
Physico-Chemical Features Profile of your "pet protein"
In this exercise, I want you to create a features profile (similar to
that of your individual unknowns) of the protein for which you are creating
a web page for the course. To create this profile, you will need the protein
sequence (in single letter aa code) of your pet protein.
- Calculate the Molecular Mass, Isoelectric Point, and Amino Acid Characteristics
of your protein. We can make a theoretical calculation of these protein
features from a protein sequence using the "MPCT"
features analysis tool at ABIM in
France. This web page is in French, but it's pretty obvious what to do.
- In the MPCT web form page, paste your protein sequence into the big
empty mauve window. Then click on the submit ("Soumettre") button.
If you made a mistake, you can reset the page using the reset ("Effacer")
button.
- You'll get a response, results page, in about a minute or two (sometimes
less). Save this result as "Source" using the "Save As"
button from the "File" pull-down menu. Add this result to your
web page. You may modify it to fit the particular design attributes of
your page.
- Create a theoretical Peptide Fragment Map for your protein. Peptide
fragment maps can be generated from amino acid sequence data using the
Peptide Mass
tool at Expasy.
- In the Peptide Mass web form page, paste your amino acid sequence into
the large mauve window. Ensure that the cleavage agent ("enzyme")
selected is Trypsin. Review, but do not alter any of the default values
for the other experimental conditions. The default values will suffice
for this exercise. Submit your query by clicking on the "Perform"
button.
- You should get a web-based response in a few minutes. Save this result
as "Source" using the "Save As" button from the "File"
pull-down menu. Add this result to your web page. You may modify it to
fit the particular design attributes of your page.
- Incidentally, this is how I created the
physico-chemical data for your unknown protein profiles.