Exercise 6 - Folding of RNA or ssDNA Molecules
The objectives of this exercise are to examine nucleic acid secondary
structure prediction programs:
-
Zuker's MFOLD program is emphasized along with the GCG programs STEMLOOP
and PLOTFOLD, together with the graphics display programs SQUIGGLES, MOUNTAINS,
CIRCLES, DOMES, and DOTPLOT
-
MFOLD and PLOTFOLD are used to investigate suboptimal RNA secondary structures.
-
Comparison of optimal folding program MFOLD made to simple stem-loop programs:
GCG STEMLOOP, GCG DOTPLOT.
Main Specific Tasks to Perform in Exercise 6:
-
Select an RNA sequence/structure with which to work.
-
MFOLD - multiple sub-optimal RNA structures
-
Learn about the GCG program MFOLD
-
Use PLOTFOLD to examine the squiggles, mountains, circles, domes, and dotplot
output from MFOLD.
-
Use PLOTFOLD to examine sub-optimal folds produced by MFOLD.
-
MFOLD parameters: constraining the folding pattern.
-
Based on your knowledge of the biology of your molecule, use some of the
other parameters of MFOLD:
/REMOVE /PREVENT /FORCE.
-
Using the GCG programs STEMLOOP and DOTPLOT for RNA structures.
-
Learn about the GCG program STEMLOOP. Analyse your sequence using
STEMLOOP and display your results using DOTPLOT.
-
Compare these results with those from MFOLD as displayed using PLOTFOLD.
-
Attempt to find a set of parameters for STEMLOOP which yields output identical
to MFOLD
-
Questions
Folding Servers
Zuker's RNA
Mfold Server
Zuker's
DNA Mfold Server
tRNAScan
Server
RNA Structure Databases
tRNA
gene database
rRNA SSU and LSU
RNase P Database
rRNA WWW server
Signal Recognition
Particle Database
tmRDB
uRNA Database
(e.g. U1 etc.)
Many other links can be found at
RNA World at Jena
{A. Locate an RNA sequence of interest .}
Choose a sequence that has a region of biologically significant intrastrand
structure. Examples of such include:
-
A segment of a larger folded RNA, e.g. a 16S or 23S RNA, whose structure
is available.
-
tRNA molecules or regions of rRNA molecules.
-
RNA virus genomes, eg Cauliflower mosaic virus.
-
The RNA I species which negatively inhibit initiation of DNA replication
in ColE1-type plasmids.
-
Origins of DNA replication.
-
The leader sequences in mRNA species from operons which show attenuation,
e.g. the trp operon of E. coli.
Using one of the databases above, or another of your choice, identify a
molecule with known secondary structure. Use a molecule of your choice.
Based on your knowledge of the biology of your molecule (and/or the annotation
available at the database), choose an appropriate region of the sequence
for intrastrand structure analysis Because RNA folding is a CPU
intensive process, limit your analysis to a sequence no longer than 400
bases. If the sequence you are interested in is longer, simply
cut out part of the molecule for analysis.
{Enter a description of your molecule and the known base-pairing
into your notebook.}
{B. MFOLD - multiple sub-optimal RNA structures.}
{1. Learn about the GCG programs MFOLD in GENHELP.} You may
use either the GCG Mfold program or Zuker's Mfold server to fold your RNAs.
Note that Zuke's server is likely to be somewhat mre up-to-date, and may
in fact be faster. If you use Zuker's server, you must copy the "GCG
connect" file back to your local machine in order to view mountains and
domes plots, below.
{2. Use PLOTFOLD to examine the squiggles, mountains, circles,
domes, and dotplot output from MFOLD.} If you ran your analysis on
the Zuker server, you may use the "GCG connect" file with the programs
SQUIGGLES, DOMES, CIRCLES, and MOUNTAINS to make the plots.
{3. Use PLOTFOLD to examine at least five of the sub-optimal folds
produced by MFOLD. What graphics output is most useful in this comparison?
Which one of these structures, if any, corresponds to the known structure
of your RNA? }
{C. MFOLD parameters: constraining the folding
pattern.}
{1. Based on your knowledge of the biology of your molecule (and/or
the annotation in the sequence file), use some of the parameters of MFOLD:
/REMOVE /PREVENT /FORCE.} These parameters constrain the predicted
folding; by adding in constraints to force base pairs that you know should
be present, or to prevent base pairs that you know are absent, you can
get the minimum free-energy strucure that contains the biologically relevant
structures. For this exercise, use information from the known folded structure
of your RNA. In a real situation, this information might come from laboratory
experiments such as nuclease digestion studies. Note that these options
are available both in the GCG version and on Zuker's server.
{D. Using the GCG programs STEMLOOP and DOTPLOT
for RNA structures.}
Stemloop is a very simplistic RNA structure finder. In this section
you will compare the results from this simplistic approach with the more
accurate analysis you did is section B.
{1. Learn about the GCG STEMLOOP program using GENHELP.
Analyse your sequence using STEMLOOP and display your results using DOTPLOT.}
{2. Compare these results with those from MFOLD as displayed using
PLOTFOLD.}
{3. Attempt to find a set of parameters for STEMLOOP which yields
output in which you can see the stems predicted using MFOLD.}
{E. Questions.}
-
What are the problems with the SQUIGGLES representation for RNA structures?
-
Name a representation that corrects these problems?
-
What is the difference between an internal loop and a bulge loop?
-
What are the advantages of MFOLD relative to STEMLOOP?
-
What are the disadvantages of MFOLD relative to STEMLOOP?
-
What kinds of structures cannot be found by folding programs such as
MFOLD?
-
In general, which will destabilize a structure more: an internal loop
or a bulge loop. Explain your
-
answer and include energies.
-
What are the main energy terms considered in computer prediction of
RNA folding?
-
List the favorable energy terms (those that stabilize structures) in
RNA folding?
-
When is a GU basepair less stable that of a AU basepair?
-
What laboratory methods might one use to test the correctness of a predicted
structure?
-
What non-laboratory (e.g., computational) methods might on use to test
the correctness of a predicted structure?
-
What is a P-num plot and why would one use it?