Exercise 5 - BIMM 141 - Spring 2001

Doug Smith ... KEY

Score: ... 244 pts
A. 15 pts
B. 9 pts
C. 24 pts
D. 18 pts
E. 178 pts

 

{Answer the Questions at the end as you proceed through the Exercise}

{A. TIGR - The Institute for Genomic Research} [15 pts]

{1. Peruse the TIGR site to see what is there and get a feel for what TIGR does}
Comments of "What's new", "Hot Links", etc [3 pts]

{2. TIGR Microbial Database: Completed Genomes}
Info on one of the TIGR organisms [3 pts]

{3. TIGR Microbial Database: Incomplete Genomes}
Examine one or two incomplete Eucaryotic genomes [3 pts]

{4. Other TIGR Databases}
Types of info in TIGR database, eg Arabidopsis [3 pts]

{5. TIGR Gene Indices}
Comments on Gene Indices [3 pts]

{6. TIGR Software Tools}
Answer Questions 12-13 below.

 

{B. NCBI Genomic Resources} [9 pts]

{1. Entrez Genomes}
Comments on 3 columns of Main Web page for NCBI Entrez Genome [3 pts]

{2. NCBI Microbial Genomes}
Comments on same organism as for TIGR A3 above - compare [3 pts]

{3. Examine the "Graphical view" and "Coding regions" of a region of the chosen Genome}
Comments on these views [3 pts]

{4. NCBI Specific Organism Resources}
Answer Questions 19-20 below.

 

{C. Specific Organism Resources} [24 pts]

{1. NCBI Yeast Genome Site}
Answer Questions 21-22

{2. Stanford Saccharomyces Genome Database - SGD}
Comments on SGD Home page [3 pts]
Answer Question 23

{3. Munich Information center for Protein Sequences- MIPS - for Yeast}
Comments on MIPS site [3 pts]
Answer Questions 24-25

{4. NCBI Fly Genome Site}
Answer Questions 26-27

{5. Examine Features of a Fly Chromosome using the Map View Graphics}
Comments on NCBI MapView [3 pts]
Answer Questions 28-29

Click on "Display Settings" and try some other combinations.
Select a region of "9M" in upper box, "10M" in lower box of "Select Region:" options
Comments on Display Settings [3 pts]

{6. Select a Fly Gene and Examine Information Available}
Answer Question 30
{Click on "LocusID" for your Fly Gene}
Comments on Info available [3 pts]

{7. Examine your Fly Gene in FlyBase}
Comments on FlyBase [3 pts]
Answer Question 31
{Find the link to "The Interactive Fly" and examine this site}
Comments on 'Interactive Fly' [3 pts]

{8. Examine your Fly Gene in GadFly at BDGP}
Answer Questions 32-33
{Try using the "GeneSeen" Map Viewer}
Comments on what happened [3 pts]

 

{D. Information on a Human Gene} [18 pts]

{1. Find a Human Gene to work with ... Gene Ontology terms}
Comments of use of GO to find a human gene [3 pts]
Answer Questions 34,35,36

{2. NCBI Resources available for your Human Gene}
Comments on LocusLink info for your gene, including GO words and MapViewer (mv) links [3 pts]
Answer Questions 37-39

{3. GeneCard for your Human Gene}
Answer Questions 40-41

{4. Ensembl Resources available for your Human Gene}
Comments on Ensembl [3 pts]

{a. Search Ensembl for your Gene using the keyword previously used}
Comments on finding same Gene at Ensembl as at NCBI [3 pts]
Answer Questions 42-43

{b. View your Gene at Ensembl}
Answer Question 44

{5. UCSC Resources available for your Human Gene}
Comments on UCSC graphic interface [3 pts]

{a. Search UCSC for your Gene using the keyword previously used}
Comments on use of keyword at UCSC - same genes found? [3 pts]

{b. View your Gene at UCSC}
Answer Question 45

{c. Direct links between Ensembl and UCSC}
Answer Question 46

 

{E. Questions} [178 pts]

1. What is the mission of TIGR? [ 3 pts]
The mission of TIGR: continued expansion of genome sequence info and application of this info to basic bio research, medicine and agriculture
 
2. Who is Chairman of the Board of Trustees of TIGR? [ 3 pts]
The Chair: J. Craig Venter
 
3. From the TIGR "What's New" page, what is Anopheles gambiae and why is its sequence interesting? [ 5 pts]
This is the mosquito most important in spread of malaria. Sequence may provide gene info leading to control of malaria disease cycle which requires the mosquito.
 
4. What are the three Microbial "Domains" at TIGR? [ 6 pts]
These are Archaea, Eubacteria, Eucaryote
 
5. What is the size range of genomes present in the TIGR complete microbial genome list? [ 3 pts]
The size range: 0.58 Mb (Mycoplasma genitalium) - 13 Mb (S. cereviseae)
 
6. What is the TIGR Comprehensive Microbial Resource (CMR)? [ 3 pts]
This is a tool allowing access to all bacterial genome sequences completed to date, providing TIGR and other annotation.
 
7. Briefly describe three main features of the TIGR Genome Browser. [ 3 pts]
The Genome Browser, accessed via the chromosome for any completely sequenced organism from the organism main page, is a horizontal genome browser. Functions: zoom, locus display, download capability, change options shown, evidence for locus annotation of 6 different types (color coded), etc
Loaded slowly in IE, did not display everything - Loaded fine in NetScape
Very nice Browser ...
 
8. Why is the TIGR Genome Browser NOT used with organisms whose genomes are incompletely sequenced? [ 3 pts]
The Genome Browser is used only for organisms in the CMR, which includes only completed genomes.
 
9. Comment on five of the Eukaryotes whose genomes are currently being sequenced. [10 pts]
Many here: Aspergillus (3 species), Candida, Dicty, Entamoeba, Encephalitozoon, Giardia, Leishmania, Neurospora, Plasmodium, Pneumocystis, S. pombe, Trypanosoma
 
10. What is the purpose of the TIGR Gene Indices? [ 3 pts]
The Gene Indices integrates data from EST sequencing and gene research projects, providing an analysis of the transcribed sequences from public EST data.
 
11. What is TIGR EGAD? [ 3 pts]
TIGR EGAD is the Expressed Gene Anatomy Database: non-redundant set of transcript sequences, human (HT) or other (ET), compiled and annotated from GenBank
 
12. What are "software tools"? [ 3 pts]
Software tools are computer programs written to facilitate computerized execution of tasks common in bioinformatics or other areas.
 
13. What is the general focus of the TIGR Software Tools? [ 6 pts]
TIGR focuses on software tools appropriate for sequence contig formation from initial sequence 'reads', in automated annotation of genes in new DNA sequence, and in sequence analysis.
 
14. For the Microbial Genomes, what is the TaxPlot?  the TaxTable?  the ProtTable? [ 6 pts]
These are at NCBI.
TaxPlot is a 2D plot of cognate proteins in two different organisms for a given query organism, plotted according to BLAST score.
TaxTable provides a graphic display of of cognate proteins found for all
proteins in an organism, colorcoded to show kingdom of greatest similarity.
Links are provided to cognate proteins found.
ProtTable is a table of all proteins encoded by a given organism, with links to
GenPept, COG, etc
 
15. What is the purpose of the Microbial Genome "List of projects" web page? [ 3 pts]
This lists all completed and ongoing sequencing projects throughout the world, occuring in the public domain.
 
16. What are three DNA entities that are shown in the NCBI Genome "Graphical View"? [ 3 pts]
Three entities: 1) CDS with gene and mRNA; 2) Gene, tRNA, promoter, ...; 3) Other features
 
17. How does the "Graphical View" differ in its presentation from that of the "Coding Region"? [ 3 pts]
The 'Coding Region' is the table display called ProtTable; it is not graphical, and provides info only on proteins encoded.
 
18. How does the NCBI "Prominant Organisms" Genome Web page differ from the NCBI "All Organisms" Web page? [ 3 pts]
'Prominant Organisms' are NIH model organisms, whereas 'All Orgamisms' has info on all organisms currently being sequenced.
 
19. Comment on similarities and differences between the NCBI Organism Web sites for Arabidopsis, C. elegans, D. melanogaster, and S. cerevisiae. [ 3 pts]
All have chromosomes as vertical sticks, with links; standard search facility at top; links to organism specific resources and sites, including related databases, FTPsites, Sequencing Projects; BLAST search of the specific Genome.
 
20. What are the primary types of information present on an NCBI Organism Chromosome page, such as that for your chosen Yeast Chromosome? [ 4 pts]
Protein encoding genes, color coded according to functional class; BLAST homologs; Feature table; References
 
21. What is the NCBI RefSeq project? What is its purpose? What is the Accession Format that is used? [ 6 pts]
RefSeq provides reference sequence standards for chroms, mRNA, proteins for a given organism; they are thus a foundation for functional annotation, mutation analysis, gene expression studies, and polymorphism discovery.
Accession Format is two capital letters, an underline symbol, and 5 digit number; the two capital letters define the type of ref seq, eg NC_***** for complete genome or chromosome; NT_***** for genomic contig, NM_***** for mRNA, and NP_***** for protein. XM and XP are for models.
 
22. At SGD, briefly describe the links to "Sequence Analysis and Tools", "Maps", and "Sacch3D". [ 3 pts]
Link to 'Sequence Analysis and Tools' brings up a page with links to tools and displays associated with seq analysis: BLAST, FASTA, Genomic View, Sacch3D, etc
'Maps' link provides links to several types of chromosome maps, as well as submitting new info
'Sacch3D' is the yeast protein 3D structure database, with tools and information
 
23. At MIPS, briefly describe the links to "Chromosome Display", "Pathways", and "Functional Analysis Projects". [ 3 pts]
 
24. What is the single most annoying feature of the MIPS Web site? [ 3 pts]
 
25. What are the four levels of detail for Entrez Map Viewer? Briefly describe each. [ 8 pts]
1) Home Page for an organism - describes and links to resources available
2) Genome View - graphically display complete genome via chrom ideograms
3) Map View - present vertical display of one or more maps for given chrom
4) Sequence View - show sequence and display graphically features of given chrom region.
 
26. What organisms are currently represented in the Entrez Map Viewer? [ 5 pts]
Arabidopsis, Drosophila, Homo sapiens, Mus musculus, Zea mays (corn)
 
27. What are the primary options and features present in the NCBI Map View graphics display for a Fly Chromosome? [ 6 pts]
1) Vertical display of up to six maps chosen from different types in the 'Display Settings' window; 2) links to other resources via 2-letter code links; 3) zoom capabilities either via Zoom In/ Zoom Out or via chrom coords; 4) links to other chroms; 5) standard NCBI search facility
 
28. What do some of the 2-letter symbols to the right in the Entrez Map Viewer stand for? [ 6 pts; any three]
sv-Seq Viewer; gb - GenBank; gr - GenBank of region; fr - FASTA of DNA region; gp - GenPept; bl - BLAST; ll - LocusLink; fb - FlyBase; gf - GadFly
 
29. What is LocusLink and its function? What organisms are currently in LocusLink? [ 4 pts]
LocusLink provides links and information about a given locus in a given organism
Current organisms are Drosophila, Human, mouse, rat, and zebrafish.
 
30. What is FlyBase and where is it located? [ 3 pts]
'FlyBase' is the current Database of the Drosophila Genome, originating in the ACeDB FlyBase database. It is maintained primarily at the University of Indiana.
 
31. What is "The Interactive Fly"? [ 3 pts]
"Interactive Fly" is a text based database providing overviews of various developmental and cellular fly processes, with integration from other vertebrates, with links to FlyBase.
 
32. What is GadFly? [ 3 pts]
GadFly is the LBL version of FlyBase called "Genome Annotation Database of Drosophia".
 
33. Briefly describe the layout of the NCBI "Human Genome" Home Page. [ 3 pts]
This page is in standard NCBI layout, with Search at top followed by 3 columns.
Left column is for NCBI Web resources for human genome analysis.
Middle column is about 'What's New, tips, etc, eg working draft, MapViewer tips
Right column is browse chroms, Other genomes, etc
 
34. What is the Gene Ontology (GO) Consortium, and what is its purpose? [ 3 pts]
This is a Consortium of academic and biotech personnel, whose goal is to produce a dynamic controlled biological (gene, protein, etc) vocabulary that can be applied to all eukaryotes.
 
35. What are the types of information LocusLink returns for the Loci found in response to your keyword search? What are the one-letter links shown? [ 6 pts; any three letter links]
LocusID, Organism, Gene Symbol, Brief description, chrom position, Links via 1-letter code. These are: Red P - PubMed, O-OMIM, R-RefSeq, G-GenBank, P-Protein, H-HomoloGene, U-UniGene, V-variation
 
36. What does "Variation" mean? [ 3 pts]
Variation means naturally occuring polymorphisms, in particular, SNPs
 
37. What is OMIM, and what information not in LocusLink does it provide about your gene? [ 3 pts]
OMIM, Online Mednelian Inheritance in Man from John Hopkins, provides text information, links, and references for the given gene, with an emphasis on disease related information.
 
38. What is NCBI UniGene, and what information not in LocusLink does it provide about your gene? [ 3 pts]
UniGene is a system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each cluster contains sequences representing a unique gene plus related cell-based information.
 
39. What is HomoloGene, and what information is present here that is not in LocusLink? [ 4 pts]
HomoloGene is a homology resource which includes both curated and calculated orthologs and homologs for genes in UniGene and LocusLink. This homology information is not present in LocusLink.
 
40. What is GeneCard, and where is it? [ 3 pts]
GeneCard is another resource page for genes, located at the Weizmann Institute in Israel, containing information found at several NCBI sites.
 
41. How does the GeneCard information compare with the NCBI information about your gene? [ 3 pts]
GeneCard information is very similar to that found at NCBI in terms of links available, but is all found on a single page rather than across several 'systems'.
It is less 'complete' and redundant than the totality found at NCBI
 
42. How does the Gene information at Ensembl compare with that at NCBI LocusLink? [ 3 pts]
Info is similar, but Ensembl is more EMBL / EBI related: InterPro, Sequence shown, Exon information, Splice information ...
 
43. What is the Ensembl "Disease Browser", and how is it related to NCBI? [ 3 pts]
Ensembl 'Disease Browser' is a text-based plus links database on human disease with info on genetic basis. It is rather similar in principle to OMIM at NCBI.
 
44. How does the UCSC Graphic Viewer compare with the NCBI MapView and the EBI ContigView ? [ 3 pts]
UCSC Graphic Viewer and Ensembl ContigView are both horizontal graphics systems that show several types of maps and information found along chromosomes of genomes. NCBI MapView is vertical. All have zoom capability.
All can show different types of information, but use different methods for this.
Specific information shown varies some between the three systems.
 
 
45. If you use the direct links between Ensembl and UCSC, do you get the same graphic views as when you find the gene independently at each of the Web sites? If not, what is the difference? [ 3 pts]
Sometimes there are differences, depending on from where one links to the graphics pages. The differences are mainly 1) level of zoom, and 2) what specific information is shown by default.
 
46. What is RepeatMasker and what is it used for? [ 3 pts]
RepeatMasker is a program used to find different types of repeats in DNA sequence, and can be used to "mask" such sequences for further analysis. UCSC Graphic Viewer shows results from RepeatMasker, and can further show each of the major types of repeats found.