Exercise 5 - BIMM 141 - Spring 2001
Doug Smith ... KEY
- Score: ... 244 pts
- A. 15 pts
- B. 9 pts
- C. 24 pts
- D. 18 pts
- E. 178 pts
{Answer the Questions at the end as you proceed through
the Exercise}
{A. TIGR - The Institute for Genomic Research} [15 pts]
{1. Peruse the TIGR site to see what is there and get
a feel for what TIGR does}
Comments of "What's new",
"Hot Links", etc [3 pts]
{2. TIGR Microbial Database: Completed Genomes}
Info on one of the TIGR organisms
[3 pts]
{3. TIGR Microbial Database: Incomplete Genomes}
Examine one or two incomplete
Eucaryotic genomes [3 pts]
{4. Other TIGR Databases}
Types of info in TIGR database,
eg Arabidopsis [3 pts]
{5. TIGR Gene Indices}
Comments on Gene Indices [3
pts]
{6. TIGR Software Tools}
Answer Questions 12-13 below.
{B. NCBI Genomic Resources} [9 pts]
{1. Entrez Genomes}
Comments on 3 columns of Main
Web page for NCBI Entrez Genome [3 pts]
{2. NCBI Microbial Genomes}
Comments on same organism
as for TIGR A3 above - compare [3 pts]
{3. Examine the "Graphical view" and "Coding
regions" of a region of the chosen Genome}
Comments on these views [3
pts]
{4. NCBI Specific Organism Resources}
Answer Questions 19-20 below.
{C. Specific Organism Resources} [24 pts]
{1. NCBI Yeast Genome Site}
Answer Questions 21-22
{2. Stanford Saccharomyces Genome Database - SGD}
Comments on SGD Home page
[3 pts]
Answer Question 23
{3. Munich Information center for Protein Sequences-
MIPS - for Yeast}
Comments on MIPS site [3 pts]
Answer Questions 24-25
{4. NCBI Fly Genome Site}
Answer Questions 26-27
{5. Examine Features of a Fly Chromosome using the Map
View Graphics}
Comments on NCBI MapView [3
pts]
Answer Questions 28-29
Click on "Display Settings" and try some other combinations.
Select a region of "9M" in upper box, "10M"
in lower box of "Select Region:" options
Comments on Display Settings
[3 pts]
{6. Select a Fly Gene and Examine Information Available}
Answer Question 30
{Click on "LocusID" for your Fly Gene}
Comments on Info available
[3 pts]
{7. Examine your Fly Gene in FlyBase}
Comments on FlyBase [3 pts]
Answer Question 31
{Find the link to "The Interactive Fly" and
examine this site}
Comments on 'Interactive Fly'
[3 pts]
{8. Examine your Fly Gene in GadFly at BDGP}
Answer Questions 32-33
{Try using the "GeneSeen" Map Viewer}
Comments on what happened
[3 pts]
{D. Information on a Human Gene} [18 pts]
{1. Find a Human Gene to work with ... Gene Ontology
terms}
Comments of use of GO to find
a human gene [3 pts]
Answer Questions 34,35,36
{2. NCBI Resources available for your Human Gene}
Comments on LocusLink info
for your gene, including GO words and MapViewer (mv) links [3
pts]
Answer Questions 37-39
{3. GeneCard for your Human Gene}
Answer Questions 40-41
{4. Ensembl Resources available for your Human Gene}
Comments on Ensembl [3 pts]
{a. Search Ensembl for your Gene using the keyword previously
used}
Comments on finding same Gene
at Ensembl as at NCBI [3 pts]
Answer Questions 42-43
{b. View your Gene at Ensembl}
Answer Question 44
{5. UCSC Resources available for your Human Gene}
Comments on UCSC graphic interface
[3 pts]
{a. Search UCSC for your Gene using the keyword previously
used}
Comments on use of keyword
at UCSC - same genes found? [3 pts]
{b. View your Gene at UCSC}
Answer Question 45
{c. Direct links between Ensembl and UCSC}
Answer Question 46
{E. Questions} [178
pts]
- 1. What is the mission of TIGR? [ 3 pts]
- The mission of TIGR: continued expansion
of genome sequence info and application of this info to basic
bio research, medicine and agriculture
-
- 2. Who is Chairman of the Board of Trustees of TIGR? [ 3 pts]
The Chair: J. Craig Venter
-
- 3. From the TIGR "What's New" page, what is Anopheles
gambiae and why is its sequence interesting? [ 5 pts]
- This is the mosquito most important
in spread of malaria. Sequence may provide gene info leading
to control of malaria disease cycle which requires the mosquito.
-
- 4. What are the three Microbial "Domains" at TIGR?
[ 6 pts]
- These are Archaea, Eubacteria, Eucaryote
-
- 5. What is the size range of genomes present in the TIGR
complete microbial genome list? [
3 pts]
- The size range: 0.58 Mb (Mycoplasma
genitalium) - 13 Mb (S. cereviseae)
-
- 6. What is the TIGR Comprehensive Microbial Resource (CMR)?
[ 3 pts]
- This is a tool allowing access to
all bacterial genome sequences completed to date, providing TIGR
and other annotation.
-
- 7. Briefly describe three main features of the TIGR Genome
Browser. [ 3 pts]
- The Genome Browser, accessed via
the chromosome for any completely sequenced organism from the
organism main page, is a horizontal genome browser. Functions:
zoom, locus display, download capability, change options shown,
evidence for locus annotation of 6 different types (color coded),
etc
- Loaded slowly in IE, did not display
everything - Loaded fine in NetScape
- Very nice Browser ...
-
- 8. Why is the TIGR Genome Browser NOT used with organisms
whose genomes are incompletely sequenced? [ 3 pts]
- The Genome Browser is used only
for organisms in the CMR, which includes only completed genomes.
-
- 9. Comment on five of the Eukaryotes whose genomes are currently
being sequenced. [10 pts]
- Many here: Aspergillus (3 species),
Candida, Dicty, Entamoeba, Encephalitozoon, Giardia, Leishmania,
Neurospora, Plasmodium, Pneumocystis, S. pombe, Trypanosoma
-
- 10. What is the purpose of the TIGR Gene Indices? [ 3 pts]
- The Gene Indices integrates data
from EST sequencing and gene research projects, providing an
analysis of the transcribed sequences from public EST data.
-
- 11. What is TIGR EGAD? [
3 pts]
- TIGR EGAD is the Expressed Gene
Anatomy Database: non-redundant set of transcript sequences,
human (HT) or other (ET), compiled and annotated from GenBank
-
- 12. What are "software tools"? [ 3 pts]
- Software tools are computer programs
written to facilitate computerized execution of tasks common
in bioinformatics or other areas.
-
- 13. What is the general focus of the TIGR Software Tools?
[ 6 pts]
- TIGR focuses on software tools appropriate
for sequence contig formation from initial sequence 'reads',
in automated annotation of genes in new DNA sequence, and in
sequence analysis.
-
- 14. For the Microbial Genomes, what is the TaxPlot?
the TaxTable? the ProtTable? [
6 pts]
- These are at NCBI.
- TaxPlot is a 2D plot of cognate
proteins in two different organisms for a given query organism,
plotted according to BLAST score.
- TaxTable provides a graphic display
of of cognate proteins found for all
- proteins in an organism, colorcoded
to show kingdom of greatest similarity.
- Links are provided to cognate proteins
found.
- ProtTable is a table of all proteins
encoded by a given organism, with links to
- GenPept, COG, etc
-
- 15. What is the purpose of the Microbial Genome "List
of projects" web page? [
3 pts]
- This lists all completed and ongoing
sequencing projects throughout the world, occuring in the public
domain.
-
- 16. What are three DNA entities that are shown in the NCBI
Genome "Graphical View"? [
3 pts]
- Three entities: 1) CDS with gene
and mRNA; 2) Gene, tRNA, promoter, ...; 3) Other features
-
- 17. How does the "Graphical View" differ in its
presentation from that of the "Coding Region"? [ 3 pts]
- The 'Coding Region' is the table
display called ProtTable; it is not graphical, and provides info
only on proteins encoded.
-
- 18. How does the NCBI "Prominant Organisms" Genome
Web page differ from the NCBI "All Organisms" Web page?
[ 3 pts]
- 'Prominant Organisms' are NIH model
organisms, whereas 'All Orgamisms' has info on all organisms
currently being sequenced.
-
- 19. Comment on similarities and differences between the NCBI
Organism Web sites for Arabidopsis, C. elegans, D. melanogaster,
and S. cerevisiae. [ 3 pts]
All have chromosomes as vertical
sticks, with links; standard search facility at top; links to
organism specific resources and sites, including related databases,
FTPsites, Sequencing Projects; BLAST search of the specific Genome.
-
- 20. What are the primary types of information present on
an NCBI Organism Chromosome page, such as that for your chosen
Yeast Chromosome? [ 4 pts]
- Protein encoding genes, color coded
according to functional class; BLAST homologs; Feature table;
References
-
- 21. What is the NCBI RefSeq project? What is its purpose?
What is the Accession Format that is used? [ 6 pts]
- RefSeq provides reference sequence
standards for chroms, mRNA, proteins for a given organism; they
are thus a foundation for functional annotation, mutation analysis,
gene expression studies, and polymorphism discovery.
- Accession Format is two capital
letters, an underline symbol, and 5 digit number; the two capital
letters define the type of ref seq, eg NC_***** for complete
genome or chromosome; NT_***** for genomic contig, NM_***** for
mRNA, and NP_***** for protein. XM and XP are for models.
-
- 22. At SGD, briefly describe the links to "Sequence
Analysis and Tools", "Maps", and "Sacch3D".
[ 3 pts]
- Link to 'Sequence Analysis and Tools'
brings up a page with links to tools and displays associated
with seq analysis: BLAST, FASTA, Genomic View, Sacch3D, etc
- 'Maps' link provides links to several
types of chromosome maps, as well as submitting new info
- 'Sacch3D' is the yeast protein 3D
structure database, with tools and information
-
- 23. At MIPS, briefly describe the links to "Chromosome
Display", "Pathways", and "Functional Analysis
Projects". [ 3 pts]
-
- 24. What is the single most annoying feature of the MIPS
Web site? [ 3 pts]
-
- 25. What are the four levels of detail for Entrez Map Viewer?
Briefly describe each. [ 8
pts]
- 1) Home Page for an organism - describes
and links to resources available
- 2) Genome View - graphically display
complete genome via chrom ideograms
- 3) Map View - present vertical display
of one or more maps for given chrom
- 4) Sequence View - show sequence
and display graphically features of given chrom region.
-
- 26. What organisms are currently represented in the Entrez
Map Viewer? [ 5 pts]
- Arabidopsis, Drosophila, Homo sapiens,
Mus musculus, Zea mays (corn)
-
- 27. What are the primary options and features present in
the NCBI Map View graphics display for a Fly Chromosome? [ 6 pts]
- 1) Vertical display of up to six
maps chosen from different types in the 'Display Settings' window;
2) links to other resources via 2-letter code links; 3) zoom
capabilities either via Zoom In/ Zoom Out or via chrom coords;
4) links to other chroms; 5) standard NCBI search facility
-
- 28. What do some of the 2-letter symbols to the right in
the Entrez Map Viewer stand for? [
6 pts; any three]
- sv-Seq Viewer; gb - GenBank; gr
- GenBank of region; fr - FASTA of DNA region; gp - GenPept;
bl - BLAST; ll - LocusLink; fb - FlyBase; gf - GadFly
-
- 29. What is LocusLink and its function? What organisms are
currently in LocusLink? [
4 pts]
- LocusLink provides links and information
about a given locus in a given organism
- Current organisms are Drosophila,
Human, mouse, rat, and zebrafish.
-
- 30. What is FlyBase and where is it located? [ 3 pts]
- 'FlyBase' is the current Database
of the Drosophila Genome, originating in the ACeDB FlyBase database.
It is maintained primarily at the University of Indiana.
-
- 31. What is "The Interactive Fly"? [ 3 pts]
- "Interactive Fly" is a
text based database providing overviews of various developmental
and cellular fly processes, with integration from other vertebrates,
with links to FlyBase.
-
- 32. What is GadFly? [
3 pts]
- GadFly is the LBL version of FlyBase
called "Genome Annotation Database of Drosophia".
-
- 33. Briefly describe the layout of the NCBI "Human Genome"
Home Page. [ 3 pts]
- This page is in standard NCBI layout,
with Search at top followed by 3 columns.
- Left column is for NCBI Web resources
for human genome analysis.
- Middle column is about 'What's New,
tips, etc, eg working draft, MapViewer tips
- Right column is browse chroms, Other
genomes, etc
-
- 34. What is the Gene Ontology (GO) Consortium, and what is
its purpose? [ 3 pts]
- This is a Consortium of academic
and biotech personnel, whose goal is to produce a dynamic controlled
biological (gene, protein, etc) vocabulary that can be applied
to all eukaryotes.
-
- 35. What are the types of information LocusLink returns for
the Loci found in response to your keyword search? What are the
one-letter links shown? [
6 pts; any three letter links]
- LocusID, Organism, Gene Symbol,
Brief description, chrom position, Links via 1-letter code. These
are: Red P - PubMed, O-OMIM, R-RefSeq, G-GenBank, P-Protein,
H-HomoloGene, U-UniGene, V-variation
-
- 36. What does "Variation" mean? [ 3 pts]
- Variation means naturally occuring
polymorphisms, in particular, SNPs
-
- 37. What is OMIM, and what information not in LocusLink does
it provide about your gene? [
3 pts]
- OMIM, Online Mednelian Inheritance
in Man from John Hopkins, provides text information, links, and
references for the given gene, with an emphasis on disease related
information.
-
- 38. What is NCBI UniGene, and what information not in LocusLink
does it provide about your gene? [
3 pts]
- UniGene is a system for automatically
partitioning GenBank sequences into a non-redundant set of gene-oriented
clusters. Each cluster contains sequences representing a unique
gene plus related cell-based information.
-
- 39. What is HomoloGene, and what information is present here
that is not in LocusLink? [
4 pts]
- HomoloGene is a homology resource
which includes both curated and calculated orthologs and homologs
for genes in UniGene and LocusLink. This homology information
is not present in LocusLink.
-
- 40. What is GeneCard, and where is it? [ 3 pts]
- GeneCard is another resource page
for genes, located at the Weizmann Institute in Israel, containing
information found at several NCBI sites.
-
- 41. How does the GeneCard information compare with the NCBI
information about your gene? [
3 pts]
- GeneCard information is very similar
to that found at NCBI in terms of links available, but is all
found on a single page rather than across several 'systems'.
- It is less 'complete' and redundant
than the totality found at NCBI
-
- 42. How does the Gene information at Ensembl compare with
that at NCBI LocusLink? [
3 pts]
- Info is similar, but Ensembl is
more EMBL / EBI related: InterPro, Sequence shown, Exon information,
Splice information ...
-
- 43. What is the Ensembl "Disease Browser", and
how is it related to NCBI? [
3 pts]
- Ensembl 'Disease Browser' is a text-based
plus links database on human disease with info on genetic basis.
It is rather similar in principle to OMIM at NCBI.
-
- 44. How does the UCSC Graphic Viewer compare with the NCBI
MapView and the EBI ContigView ? [
3 pts]
- UCSC Graphic Viewer and Ensembl
ContigView are both horizontal graphics systems that show several
types of maps and information found along chromosomes of genomes.
NCBI MapView is vertical. All have zoom capability.
- All can show different types of
information, but use different methods for this.
- Specific information shown varies
some between the three systems.
-
-
- 45. If you use the direct links between Ensembl and UCSC,
do you get the same graphic views as when you find the gene independently
at each of the Web sites? If not, what is the difference? [ 3 pts]
- Sometimes there are differences,
depending on from where one links to the graphics pages. The
differences are mainly 1) level of zoom, and 2) what specific
information is shown by default.
-
- 46. What is RepeatMasker and what is it used for? [ 3 pts]
- RepeatMasker is a program used to
find different types of repeats in DNA sequence, and can be used
to "mask" such sequences for further analysis. UCSC
Graphic Viewer shows results from RepeatMasker, and can further
show each of the major types of repeats found.