Gribskov & Smith

 BIMM 141 Laboratory

Spring 2001

Introduction to Bioinformatics
 

Exercise 5


Genomes and Organisms


Exercise 5 focuses on use of Web facilities to learn more about organisms which have been extensively studied and annotated: genome sequence has been determined, organismal databases have been established, genetic disease is under study.

The focus will be on Human, since the first draft of the genome sequence was recently released and a variety of Web applications have been developed to permit User access to this information. Other organisms, eg yeast, Drosophilia, Dictyostelium, and the Archae, Methanococcus janaschii, will also be included.

Much of this Exercise will permit you to examine the Web resources discussed in Lecture 3. The Web pages for Lecture 3 have many links that may be useful in execution of this Exercise 5.

The Main objectives in Exercise 5 are:

If you have any problems with understanding what is asked or with execution of a task or with whatever, please send email with questions to either Michael Gribskov , Doug Smith, or Hiren Patel, TA for BIMM 140 / 141.
 

Relevant Articles from the BIMM 140 Course Reader for Exercise 5:

Baxevanis-Ouellette, 2nd Edition, Textbook Relevant Chapters:

One could argue several others are relevant as well ...



BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



Main Specific Tasks to Perform in Exercise 5:

A. TIGR - The Institute for Genomic Research
1. Peruse the TIGR site
2. TIGR Microbial Database: Completed Genomes
3. TIGR Microbial Database: Incomplete Genomes
4. Other TIGR Databases
5. TIGR Gene Indices
6. TIGR Software Tools
B. NCBI Genomic Resources
1. Entrez Genomes
2. NCBI Microbial Genomes
3. "Graphical View" and "Coding Regions" for a given Genome
4. NCBI Specific Organism Resources
C. Specific Organism Resources
1. NCBI Yeast Genome Site
2. Stanford Saccharomyces Genome Database - SGD
3. Munich Information Center for Protein Sequences - MIPS - for Yeast
4. NCBI Fly Genome Site
5. Fly Chromosome Features via MapView Graphics
6. NCBI Information Available for a Fly Gene
7. Use of FlyBase for a Fly Gene
8. Use of GadFly at BDGP for a Fly Gene
D. Information on a Human Gene
1. Find a Human Gene using Gene Ontology (GO) terms
2. NCBI Resources Available for your Human Gene
3. GeneCard for your Human Gene
4. EBI Ensembl Resources for your Human Gene
a. Search Ensembl using the GO keyword
b. View your Gene at Ensembl
5. UCSC Resources available for your Human Gene
a. Search UCSC using the GO keyword
b. View your Gene at UCSC
c. Direct links between Ensembl and UCSC
E. Questions

 


{Answer the Questions at the end as you proceed through the Exercise}

{A. TIGR - The Institute for Genomic Research}
TIGR has determined the sequence of many prokaryotic organisms, including the first three to be sequenced. TIGR has also been involved as a major player in other sequencing efforts, including those of the mustard plant, Arabidopsis thaliana, Rice, the fungus Aspergillus, the malaria parasite Plasmodium falciparum, and the Human Genome sequencing efforts

{1. Peruse the TIGR site to see what is there and get a feel for what TIGR does}
Examine links from the TIGR Home page. See "What's new". Note the "Hot Links". Note the types of efforts TIGR is involved with: Databases, Gene Indices, Software, etc.  Include a few comments about what you did in your Notebook. Answer Exercise Questions 1-3

{2. TIGR Microbial Database: Completed Genomes}
Go to the TIGR Microbial Database, choose one of the organisms whose genome is completely sequenced, and include some information here about the organism: what it is, why it is interesting, what kingdom it is from, who did the sequencing, size of the genome, a few comments about the genome.
Work some with the Genome Browser from the TIGR CMR (Comprehensive Microbial Resource).
Answer Exercise Questions 4-8.

Note: the TIGR Genome Browser is new since Lecture 3 of 6 April 2001 !!

{3. TIGR Microbial Database: Incomplete Genomes}
Examine one or two of the Eucaryotic genomes whose sequencing is incomplete and for which links are provided. State in your Notebook which organisms you examined.

{4. Other TIGR Databases}
Go to one of the other TIGR databases, for example, the TIGR Arabidopsis thaliana database, and briefly describe in your Notebook the types of information available. Answer Exercise Question 9.

{5. TIGR Gene Indices}
Go to the TIGR Gene Indices web pages, and check out some of the organisms for which Gene Indices are available. Comment in your Notebook what you did. Answer Exercise Questions 10-11

{6. TIGR Software Tools}
Go to the TIGR Software Tools web pages, and check out some of the programs available. Answer Exercise Questions 12-13.
 
 

{B. NCBI Genomic Resources}

NCBI has taken a lead role in providing Web page access to all information related to Genome Sequences. The completed "Genome Sequences" includes those of true organisms (eukaryotes, eubacteria, archae), as well as viruses, organelles, and plasmids. Graphic resources are available as well as text, with links provided between NCBI sources of information and to sites elsewhere.

{1. Entrez Genomes}
Go to the NCBI Entrez Genome site  and examine some of what is there.  Comment on how the three columns or tables of the Web page are used. Answer Exercise Question 14

{2. NCBI Microbial Genomes}
Go the the NCBI Microbial Genome Web page and find the organism you used in A3 above.  Peruse some of the information and Web pages available.  Comment on how the information and display at NCBI compares with such at TIGR. Answer Exercise Questions 15-16

{3. Examine the "Graphical view" and "Coding regions" of a region of the chosen Genome}
Choose appropriate links (in the left table) to do this.  Comment on these views. Answer Exercise Questions 17-18.

{4. NCBI Specific Organism Resources}
From the NCBI Entrez Genome Web page, navigate to the NCBI "Prominant Organisms" Web page. Look at the Home Page for a few of these organisms, eg Arabidopsis, C. elegans, D. melanogaster, and S. cerevisiae Answer Exercise Questions 19-20.


 

{C. Specific Organism Resources}

Both NCBI and dedicated specific Web sites elsewhere have developed resources for viewing and obtaining information on the sequence and genomics of the major "model" organisms, in addition to Human.

{1. NCBI Yeast Genome Site}
From the NCBI Entrez Genome Web page, navigate to the NCBI Yeast Genome Site and click on one of the Yeast Chromosomes. Answer Exercise Question 21. Examine the REFSEQ link and answer Exercise Question 22.

{2. Stanford Saccharomyces Genome Database - SGD}
Connect from the NCBI Yeast Genome Site to SGD and examine some of what is there. Comment on how the three columns or tables of the Web page are used. Answer Exercise Question 23.

{3. Munich Information center for Protein Sequences- MIPS - for Yeast}
Link to the Yeast MIPS site in Germany; this is the other main Yeast Web site. Examine some of what is there. Answer Exercise Questions 24 and 25.

{4. NCBI Fly Genome Site}
Go to the NCBI Drosophila melanogaster Genome Site. Note the "Entrez Map Viewer"; answer Exercise Questions 26 and 27

{5. Examine Features of a Fly Chromosome using the Map View Graphics}
Click on one of the Fly Chromosomes. Examine some of what is present on the resulting Graphic of the Chromosome; answer Exercise Questions 28 and 29. Note the Vertical Maps displayed.
Click on "Display Settings" and try some other combinations.
Select a region of "9M" in upper box, "10M" in lower box of "Select Region:" options

briefly describe what you get.

{6. Select a Fly Gene and Examine Information Available}
Select a Fly gene either via a Search box or by displaying the "Genes_seq" vertical map display in the Fly Map View graphic, zooming in, and clicking on a Gene symbol. Answer Exercise Question 30.
{Click on "LocusID" for your Fly Gene} This brings you to the LocusLink page for your gene. Briefly describe the types of information present.

{7. Examine your Fly Gene in FlyBase}
Link to FlyBase from the LocusLink page for your fly gene. Examine and briefly describe some of the information available. Answer Exercise Question 31.
{Find the link to "The Interactive Fly" and examine this site}

{8. Examine your Fly Gene in GadFly at BDGP}
Find a link for your gene to GadFly, and then go to the original site at Berkeley (BDGP); answer Exercise Questions 32 and 33.
{Try using the "GeneSeen" Map Viewer} What happened?

 

{D. Information on a Human Gene}

With the publication in Science and Nature of the First Draft of the Human Genome Sequence, from the Celera and Consortium efforts, respectively, the tip of the iceberg for an incredible amount of information on the Human Genome has been breached. This genomic information is currently found at three main sites: NCBI, EBI via Ensembl, and at UC Santa Cruz. Here we examine some of these information sources for a human gene of your choice.

{1. Find a Human Gene to work with ... Gene Ontology terms}
Go to NCBI, choose the Genomic Biology link, and go to Human; answer Exercise Question 34.
1) Now click on the LocusLink link, and then on the Gene Ontology link under "New Features", to take you to the Home Page for the Gene Ontology Consortium. This is a good source of "keywords" that you can use to find a gene of interest.
Examine the "TEXT" links from the "Molecular Function", "Biological Process", and "Cellular Component" sections; answer Exercise Question 35.
Use these lists to find a keyword or two, to use to find a Human Gene to work with.Return to the LocusLink Home Page and do a Search from the Search box using your keyword
2) If you have a favorite keyword or two in mind, eg adrenergic, you may of course use these.
3) Alternatively, if no words are attractive, go to the "Map Viewer" and link to one of the chromosomes.
Find a gene of interest via descriptive words available.
LocusLink returns a list of genes satisfying your keyword criteria; answer Exercise Question 36.

{2. NCBI Resources available for your Human Gene}
Click on the LocusID link to go to the LocusLink description for your chosen gene.
Briefly describe some of the information available for your Gene. Note the GO words used. Note the MapViewer (mv) links. Answer Exercise Questions 37-39.

{3. GeneCard for your Human Gene}
Go to the GeneCard link for your gene. Answer Exercise Questions 40-41.

{4. Ensembl Resources available for your Human Gene}
The Ensembl resources are those created at EBI in Europe for annotation and information retrieval on the Human Genome Project. Examine some of the links and information available at the Ensembl site, and record what you did in your Notebook.

{a. Search Ensembl for your Gene using the keyword previously used}
Search in the Ensembl Main Page search facility using "All" using the keyword you previously used.
Does Ensembl report the same genes as did NCBI LocusLink? What information did you need to use to find the same gene as used above? Answer Exercise Questions 42-43.

{b. View your Gene at Ensembl}
In the "Genome Location" part of the "Ensembl Gene Report", click on the sequence Accession Number given, eg an AP****** number. This brings up a Graphic view of your Gene using the ContigView viewer; answer Exercise Question 44.

{5. UCSC Resources available for your Human Gene}
UC Santa Cruz is the third primary source of Human Genome Project information at the current time, do largely to the efforts of Jim Kent in the Hausler group in developing contig assembly and Web viewer tools for the HGP Consortium effort. Click on the link above to go to the UCSD Human Genome Project home page. Comment on how this Web page compares with the main Human Genome Home pages at NCBI and EBI.

{a. Search UCSC for your Gene using the keyword previously used}
Search in the UCSC "Genome Browser" facility using the keyword you previously used.
Does this "Genome Browser" report the same genes as did NCBI LocusLink? What information did you need to use to find the same gene as used above?

{b. View your Gene at UCSC}
Once you have located your same gene, click on the Gene ID number. You should see the UCSC Graphic Viewer for your gene on its chromosome, with many other objects and information; answer Exercise Question 45.

{c. Direct links between Ensembl and UCSC}
Note the direct links between displays of a given human gene using EBI Ensembl and the UCSC Graphic Viewer; answer Exercise Question 46.

 

{E. Questions}

  1. What is the mission of TIGR?
  2. Who is Chairman of the Board of Trustees of TIGR?
  3. From the TIGR "What's New" page, what is Anopheles gambiae and why is its sequence interesting?
  4. What are the three Microbial "Domains" at TIGR?
  5. What is the size range of genomes present in the TIGR complete microbial genome list?
  6. What is the TIGR Comprehensive Microbial Resource (CMR)?
  7. Briefly describe three main features of the TIGR Genome Browser.
  8. Why is the TIGR Genome Browser NOT used with organisms whose genomes are incompletely sequenced?
  9. Comment on five of the Eukaryotes whose genomes are currently being sequenced.
  10. What is the purpose of the TIGR Gene Indices?
  11. What is TIGR EGAD?
  12. What are "software tools"?
  13. What is the general focus of the TIGR Software Tools?
  14. For the Microbial Genomes, what is the TaxPlot?  the TaxTable?  the ProtTable?
  15.  What is the purpose of the Microbial Genome "List of projects" web page?
  16. What are three DNA entities that are shown in the NCBI Genome "Graphical View"?
  17. How does the "Graphical View" differ in its presentation from that of the "Coding Region"?
  18. How does the NCBI "Prominant Organisms" Genome Web page differ from the NCBI "All Organisms" Web page?
  19. Comment on similarities and differences between the NCBI Organism Web sites for Arabidopsis, C. elegans, D. melanogaster, and S. cerevisiae.
  20. What are the primary types of information present on an NCBI Organism Chromosome page, such as that for your chosen Yeast Chromosome?
  21. What is the NCBI RefSeq project? What is its purpose? What is the Accession Format that is used?
  22. At SGD, briefly describe the links to "Sequence Analysis and Tools", "Maps", and "Sacch3D".
  23. At MIPS, briefly describe the links to "Chromosome Display", "Pathways", and "Functional Analysis Projects".
  24. What is the single most annoying feature of the MIPS Web site?
  25. What are the four levels of detail for Entrez Map Viewer? Briefly describe each.
  26. What organisms are currently represented in the Entrez Map Viewer?
  27. What are the primary options and features present in the NCBI Map View graphics display for a Fly Chromosome?
  28. What do some of the 2-letter symbols to the right in the Entrez Map Viewer stand for?
  29. What is LocusLink and its function? What organisms are currently in LocusLink.
  30. What is FlyBase and where is it located?
  31. What is "The Interactive Fly"?
  32. What is GadFly?
  33. Briefly describe the layout of the NCBI "Human Genome" Home Page.
  34. What is the Gene Ontology (GO) Consortium, and what is its purpose?
  35. What are the types of information LocusLink returns for the Loci found in response to your keyword search? What are the one-letter links shown?
  36. What does "Variation" mean?
  37. What is OMIM, and what information not in LocusLink does it provide about your gene?
  38. What is NCBI UniGene, and what information not in LocusLink does it provide about your gene?
  39. What is HomoloGene, and what information is present here that is not in LocusLink?
  40. What is GeneCard, and where is it?
  41. How does the GeneCard information compare with the NCBI information about your gene?
  42. How does the Gene information at Ensembl compare with that at NCBI LocusLink?
  43. What is the Ensembl "Disease Browser", and how is it related to NCBI?
  44. How does the UCSC Graphic Viewer compare with the NCBI MapView and the EBI ContigView ?
  45. If you use the direct links between Ensembl and UCSC, do you get the same graphic views as when you find the gene independently at each of the Web sites? If not, what is the difference?
  46. What is RepeatMasker and what is it used for?


BIMM 140: | Main | 140_Info | Syllabus | Lectures | Exams | DNASYSTEM | CMS MBR |
BIMM 141: | Main | 141_Info | Syllabus | Exercises | DNASYSTEM | CMS MBR |



Latest modification: 26 April, 2001

If you have problems or questions, send email to Michael Gribskov or Doug Smith or Hiren Patel