Exercise 2
Key - Created April, 2001
Also please include your email address
in the above information ...
We would like Name and email address ...
Thanks
BI141S
Doug Smith
My comments look like this.
A key to the questions is included at the end of the file.
Score: 365 pts
Summary of Grading
A. Use Netscape and the Web to Find and Extract a SWISS-PROT Sequence 37 pts
B. Searches using Boolean Operators. 64 pts
C. Looking up Sequences with Entrez. 54 pts
D. Introduction to the GCG package. 21 pts
E. Questions. 189 pts
TOTAL 365 pts
did this ...
{A. Use Netscape and the Web to Find and Extract a SWISS-PROT Sequence:}
{1. Turn on Netscape Communicator.}
did this ...
{manuever via Hypertext links for awhile}
did this ...
{2. Bioinformatics Links: the DNASYSTEM Web Page or the CMS MBR Web Page}
{a. Go to the DNASYSTEM or the CMS MBR Web Page}
did this ...
{Spend a little time browsing the DNASYSTEM Web Page
or the CMS MBR Web Page.}
did this ... 3
pts
{3. SWISS-PROT and TrEMBL at the ExPASy Web site}
{From the DNASYSTEM Web Page or from the links above,
click on "General Sites'}
did this ...
{Access the ExPASy site by clicking on 'ExPASy'.}
did this ...
{Browse a bit to see what's at ExPASy ...}
did this ... 3
pts
{4. Find your Protein Sequence in the SWISS-PROT and TrEMBL Protein Databases}
{In "Access to SWISS-PROT and TrEMBL', click on
"by description or identification"}
did this ...
{Now enter TWO keywords in the box shown and click "Submit"}
show the two keywords used here ...
{Try a keyword search with TWO words and record your
findings; Use your own keywords; do NOT use lambda repressor !
also try REVERSING the two keywords.}
words are the same here ... 3 pts
{Do the same keyword search but with each keyword separately;
record your findings}
record findings for:
1) two words
dnaa AND coli
27
2) two words reversed
coli AND dnaa
27
3) word 1
dnaa
245
4) word 2
coli
14762
Exercise does not explicitly ask to explain
these results ... 8 pts, 2 for
each
{5. Copy your Sequence}
{Choose one of the Protein Sequences you found and have
a look at it by clicking on the hypertext-linked SwissProt name
for the sequence.}
did this ...
{Note the links present to visualize the protein and
to learn more about its structural elements. Browse through some
of these links.}
should include some statement of links looked
at ... 3 pts
{Obtain a copy of the sequence entry}
ok ...
{Save copies both as "Text" and "Source"
from both "NiceProt View" and "old SP format"}
did this ...
{Include some or all of the "Text" version
of the "old SP format" in your Notebook}
ID DNAA_ECOLI
STANDARD; PRT; 467 AA.
AC P03004; P78122;
DT 21-JUL-1986
(Rel. 01, Created)
DT 01-JUL-1993
(Rel. 26, Last sequence update)
DT 01-OCT-2000
(Rel. 40, Last annotation update)
DE CHROMOSOMAL
REPLICATION INITIATOR PROTEIN DNAA.
GN DNAA OR B3702.
OS Escherichia
coli.
OC Bacteria; Proteobacteria;
gamma subdivision; Enterobacteriaceae; 5
pts
{Note the First line in the Entry and the Last line in
the Entry. What are these?}
First line is the ID line and last line
is the // line after the sequence. Both lines serve as delimiters
of the sequence entry. 6 pts,
3 pts each
{Note also the First Two Letters in each line of the
"old SP format".}
Some statement about what these are should
be included ... the first two letters defines the nature of the
field. 6 pts, 3 pts each
{Use COPY-PASTE to COPY the Sequence directly from the
Netscape window and PASTE it into your Notebook file for Exercise
1.}
????? why have this here ??? ... Exercise
1 ??? ... already did a COPY-PASTE operation above ...
{B. Searches using Boolean Operators}
{1. Go to the SWISS-PROT full text search service.}
did this ...
{2. Enter the two keywords into the search box and hit
SUBMIT to start the search. How many matches were reported?}
coli AND dnaa 26 hits (25 + 1) 3 pts
{Now reverse the order of the TWO keywords, i.e. "repressor
lambda" instead of "lambda repressor."}
same result ... 25 + 1 = 26 hits 3 pts
{Now go back to the SWISS-PROT fill text search page
and try again using EACH of the two keywords. How many matches
were reported?}
3) word 1
dnaa
242 3 pts
4) word 2
coli
12900 3 pts
{Do the above searchs with the "append * before
and after" option selected}
record findings for:
1) two words
dnaa AND coli
27
2) two words reversed
coli AND dnaa
27
3) word 1
dnaa
245
4) word 2
coli
14762 4 pts
{Construct a table of your results as per the following}
record findings
for:
description-ID search
full-text search
keywords
words used SwissProt TrEMBL
total SwissProt
TrEMBL total
--------
---------- --------- ------
----- ---------
------ -----
1) two words
dnaa coli
0 0
0
0 0
0
with * appended * not
supported
0 0
0 4 pts
2) two words
coli dnaa
0 0
0
1 0
1
reversed
with * appended * not
supported
2 0
2 4 pts
3) word 1
dnaa
40 38
78
170 72
242
with * appended * not
supported
171 74
245 4 pts
4) word 2
coli
4974 4070 9044
7568 5332 12900
with * appended * not
supported
8193 6569 14762 4 pts
Search for the two words did NOT yield
the same results when they were reversed because the search assumes
the
two words are adjacent to each other.
{3. Do a search with the same two keywords using the
AND operator. How many matches did you get? What are the results
when you reverse the order of the keywords?}
record findings for:
description-ID search
full-text search
keywords
words used SwissProt TrEMBL
total SwissProt
TrEMBL total
--------
---------- --------- ------
----- ---------
------ -----
1) two words dnaa
AND coli 0
0 0
25 1
26
with * appended * not supported
26 1
27 4 pts
2) two words coli
AND dnaa 0
0 0
25 1
26
reversed
with * appended * not supported
26 1
27 4 pts
Same results are obtained independent of
order of the two keywords because the two words can be anywhere.
{Do the same for both the OR and NOT boolean operators,
and construct a table for your Notebook similar to the one below.}
AND operator
OR operator
NOT operator
Search words
SWISS-PROT TrEMBL total SWISS-PROT TrEMBL total
SWISS-PROT TrEMBL total
-----------------
---------- ------ ----- ---------- ------ -----
---------- ------ -----
dnaa coli
25 1
26 7713
5403 13116 145
71 216
4 pts
with * appended
26 1
27 8338
6642 14980 145
73 218
4 pts
coli dnaa
25 1
26 7713
5403 13116 7543
5331 12874 4 pts
with * appended
26 1
27 8338
6642 14980 8167
6568 147354 pts
{Try also parts of keywords. For example, for "lambda
repressor", we would try "lam AND rep". Note that
you must check the "Prefix and append wildcard '*' to words."
box with this server. Try the equivalent to "lam* AND *rep*"
for your keywords.}
8 pts
{C. Looking up Sequences with NCBI Entrez}
{1. Go to NCBI Entrez and read some of the "Entrez
Help" information.}
did this ...
{2. Return to the Entrez Home page using the NetScape
"Back" operation and click on "Proteins" to
search for a protein sequence.}
did this ...
{3. Enter one of the keywords you used at ExPASy in the
search box and click the "Go" button (or just do a RETURN)
to start the search.}
should say what keyword was
used and results obtained ... 3 pts
keyword 'coli', 74384 hits
...
{4. If the number of matches is greater than 200, use
a second and/or third keyword in the <keyword(s)> section
together with appropriate Boolean Operators.}
ok ... coli AND dnaA yielded
105 hits ...
{5. Click in turn on each of the four links "Limits",
"Preview/Index", "History", "Clipboard".
Answer the Exercise Questions on these four links. Redo the AND
query with two of your keywords using "Limits" to limit
your search to new GenPept entries within the past 5 years. Compare
these results with those above.}
did this ... coli AND dnaA, limited to mod
dates of 1996-01-01 to 2002-01-01 yielded 90 hits 3 pts
{6. In the "Display <menu>" section,
choose different items from the <menu> and click the "Display"
button. Briefly describe what the different items are in the <menu>,
particularly the "Neighborhood" and "Link"
items.}
did this ... 1) different display formats,
eg GenPept, ASN.1, Brief 3 pts
2) links to PubMed articles (abstracts),
Taxonomy, Structure, Genome, OMIM, etc 3
pts
3) neighbors ... homologues to proteins,
nucleotide sequences 3 pts
{7. Return to the <menu> item termed "Summary"
and click the "Display" button. Select the boxes of
any five (5) of the sequences by clicking on each of the boxes.
Select "GenPept" from the <menu> and click on
the "Display" button. Record in your Lab Notebook what
this has done.}
did this ... this brings up the GenPept
entries of the five chosen sequences ... 3
pts
{8. The GenPept sequences are Displayed as HTML. Click
on the HTML button, choose "Plain Text", and click on
the "Display" button. Record in your Lab Notebook what
this has done.}
did this ... this now displays the five
chosen seqs as Text with the GenPept annotation ...3 pts
{9. Examine the contents of the GenPept annotation for
the first of your Sequences. What are the Fields? Save this GenPept
entry to your Lab Notebook.}
did this ... Fields are each type of annotation,
eg LOCUS, DEFINITION, ACCESSION, REFERENCE, etc3 pts
ID RNPA_PROMI STANDARD; PRT; 119 AA.
AC P22835;
DT 01-AUG-1991 (Rel. 19, Created)
DT 01-AUG-1991 (Rel. 19, Last sequence update)
DT 01-OCT-2000 (Rel. 40, Last annotation update)
DE RIBONUCLEASE P PROTEIN COMPONENT (EC 3.1.26.5) (PROTEIN C5) (RNASE P).
GN RNPA.
OS Proteus mirabilis.
OC Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
OC Proteus.
OX NCBI_TaxID=584;
RN [1]
RP SEQUENCE FROM N.A.
RC STRAIN=LM1509;
RX MEDLINE=91033012; PubMed=2172087; [NCBI, ExPASy, EBI, Israel, Japan]
RA Skovgaard O.;
RT "Nucleotide sequence of a Proteus mirabilis DNA fragment homologous
RT to the 60K-rnpA-rpmH-dnaA-dnaN-recF-gyrB region of Escherichia
RT coli.";
RL Gene 93:27-34(1990).
CC -!- FUNCTION: RIBONUCLEASE P GENERATES MATURE TRNA MOLECULES BY
CC CLEAVING THEIR 5' ENDS. IT CAN CLEAVE ALSO THE 4.5S RNA (BY
CC SIMILARITY).
CC -!- CATALYTIC ACTIVITY: ENDONUCLEOLYTIC CLEAVAGE OF RNA, REMOVING
CC 5'-EXTRA-NUCLEOTIDE FROM TRNA PRECURSOR.
CC -!- MISCELLANEOUS: RNASE P CONSISTS OF A RNA MOIETY (M1, RNPB) AND THE
CC PROTEIN COMPONENT. BOTH ARE NECESSARY FOR FULL ENZYMATIC ACTIVITY.
CC HOWEVER, IT IS THE RNA THAT CARRIES THE CATALYTIC SITE.
CC -!- SIMILARITY: BELONGS TO THE RNPA FAMILY.
CC --------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL outstation -
CC the European Bioinformatics Institute. There are no restrictions on its
CC use by non-profit institutions as long as its content is in no way
CC modified and this statement is not removed. Usage by and for commercial
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC or send an email to license@isb-sib.ch).
CC --------------------------------------------------------------------------
DR EMBL; M58352; AAA83956.1; -. [EMBL / GenBank / DDBJ] [CoDingSequence]
DR PIR; JQ0731; JQ0731.
DR InterPro; IPR000100; Ribonuclease_P.
DR InterPro; Graphical view of domain structure.
DR Pfam; PF00825; Ribonuclease_P; 1.
DR PROSITE; PS00648; RIBONUCLEASE_P; 1.
DR ProDom [Domain structure / List of seq. sharing at least 1 domain]
DR BLOCKS; P22835.
DR DOMO; P22835.
DR PROTOMAP; P22835.
DR PRESAGE; P22835.
DR DIP; P22835.
DR SWISS-2DPAGE; GET REGION ON 2D PAGE.
KW Hydrolase; Nuclease; Endonuclease; tRNA processing.
SQ SEQUENCE 119 AA; 14059 MW; 80323E9611F89891 CRC64;
MVKLAFPREL RLLTPKHFNF VFQQPQRASS PEVTILGRQN ELGHPRIGLT IAKKNVKRAH
ERNRIKRLAR EYFRLHQHQL PAMDFVVLVR KGVAELDNHQ LTEVLGKLWR RHCRLAQKS
//
{10. Return to the display of your GenPept sequences
as HTML, and select from <menu> the "Graphics"
option and click on the "Display" button. Describe what
you see. What options does the User have? The Graphics display
is only of the first of your five sequences; how would you observe
the graphics for the fourth sequence?}
did this ..
1. The "Graphics" Option (when
it works !!!) shows a map of the Sequence, with its annotated
Features displayed on the map ... The sequence is also shown,
with annotated Features along the sequence ... 3 pts
2. To observe graphics for the fourth sequence,
return to the Summary display, choose only the fourth sequence,
display this as GenPept, and select the Graphics display ... 3 pts
{11. Return to a "Display GenPept as HTML"
and click on the "Add to Clipboard" button. Describe
what happened.}
This places the GenPept version of all 5
sequences into a Clipboard, available for downloading or saving
in a file on your local computer ... 3
pts
{12. Choose "Summary" from the <menu>
and click on the "Display" button. Note the links to
the far right for each of your five GenPept sequences. Briefly
describe what each of these links do.}
These provide links to relevant references
in PubMed, to Related Sequences (homologues), and to Taxonomy
information for each of the organisms from which the sequence
were obtained ... 3 pts
{13. Click on "Related Sequences" for one of
your five GenPept Sequences. Briefly describe what this does.}
did this for the E. coli dnaK entry, returned
1983 related sequences ...
all are heat shock proteins, presumbably
homologues of the E. coli dnaK protein ...
3 pts
{14. Use the Netscape Back command to return to the "Display
Summary" page for the five GenPept sequences, and click on
"PubMed" for the GenPept Sequence that you saved for
your Lab Notebook above. How do the PubMed References that come
up compare with those in the GenPept annotation for this sequence?
Now click on "Related Articles" for one of these References.
How does this compare with the "Related Sequences" for
the GenPept Sequence itself?}
1) the initial PubMed refs that come up
are the same as annotated in the GenPept entry ...3 pts
2) those that come up under "Related
Articles" are those that have similar keywords to the original
articles, ie are "related" ...
3 pts
{15. Use the Netscape Back command to return to the "Display
Summary" page for the five GenPept sequences, and do the
same for the "Nucleotide", "Genome", and "Taxonomy"
links if present. Briefly describe what each does.}
Each brings related GenBank entries in each
of these categories (Nucleotide - the cognate DNA sequences to
the protein sequences; Genome - cognate DNA sequences found in
completely sequenced Genome organisms to the protein sequences;
Taxonomy - info from the Taxonomy database on the organism encoding
each of the protein sequences ...9
pts, 3 pts for each of three of these types of links ...
{D. Introduction to the GCG Package}
{1. Log into the GCG server machine and set up the GCG package. }
Did this as follows:
at %, did: telnet y4306-su-1
(ssh y4306-su-1 works also ... and is a 'more sucure' connection
...)
... logged in
... then did at %: prep gcg
Some description of what was done should
be included here ... ... 3 pts
{2. Familiarize yourself with the capabilities of the
GCG package. }
Used the Web-based GCG help facilities ...
Also tried using GCG GenHelp on y4306-su-1
... ... 3 pts
{3. Initialize the GCG graphics system. }
To set graphics output to go to a file in
png format, did command at %: png
To set graphics output to go to the screen,
did command at %: xwindows
did this ...3
pts
{4. Make a test plot and save it to your lab report.}
And then did at %: plottest
Then inserted results of the plottest here:
... ... 3 pts

{5. Format a sequence for use with the GCGpackage.}
Retrieved the E. coli dnaA gene sequence
entry ECODNAAOP encoding dnaA, dnaN, and rpmH, AccNum J01602
in GenBank format, linked here to NCBI and here to html file in account.
Also retrieved a FASTA-formatted version
as text of the 3873 bp sequence entry, linked here.
did this ...3
pts
{6. Format a sequence using a text editor and the reformat
command.}
Did this by editing the FASTA-formatted
version of the ECODNAAOP file, inserting a .. line between the
first, header line and the sequence lines.
did this ...3
pts
This file "dnaaop.edited" was
used as input for REFORMAT which yielded a file with the header
line as annotation followed by the GCG .. line followed by the
sequence with the usual GCG formatting (nuc numbers, spaces, residues
in groups of 10); this file is here.
did this ... 3 pts
The following was not asked for ... ...
Did the same with a SwissProt entry for DnaA protein from Bacillus
subtilis: entry DNAA_BACSU
Used FROMEMBL to convert the entry to GCG format, set the graphics
to deliver PNG-formatted output to file DNAA_BACSU.PEPPLOT, and
ran PEPPLOT. The output graphics file is as follows:
{E. Questions:}
{Answer all of the following questions:}