Math/Stats 547-8, Winter 2003: Bio Sequence Analysis.
547 (Lecture): MWF, 9:00-10:00 AM, Room 1060 East Hall; |
---|
Instructor: Dan Burns
Office: 5834 East Hall
Phone: 763-0152
E-mail: dburns@umich.edu
This will be the temporary home page for the course; hopefully I will have time to get something more interesting together in a little while. For now, this page will provide a convenient access to the class events which are being scheduled, the assignments and group rosters, and a convenient set of links to (online) papers and web sites which will be of use in the course. For your convenience, here is the first day handout summarizing the course, as well as the course announcement which gives a bit more syllabus detail. Note, however, that I am thinking of modifying a few things over the course of the term due to recent developments.
Group assignments will be available below. First Group Problems groups will be set in class on Wednesday, January 21.
There will be no examinations in the class. There will be a final project which will consist of your studying on a particular subject and making a presentation to the class. This will be a twenty to twenty-five minute Power Point presentation, and will be done in teams of two. There will be a page of suggested topics, though you will be free to choose a topic of your own (it must be approved, however).
Link here to the page for the final project, including suggested topics, etc.
January 6-10 |
Read: DE, Chap. 1; secs. 11.1-11.2. |
Review of linearity of macromolecules; probability background. |
How do we model randomness of sequences versus biological meaning of sequence data? |
January 13-17 |
Read: DE, Chap. 2.1-2; Chap. 11.2. |
Entropy; scoring matrices; PAM matrices as Markov models. |
Entropy as a measure of ``interesting" sequence location. |
January 22-24
(No meeting Jan 20: MLK) |
Read: DE, Chap. 11.3; Chap. 2.3-2.7. |
Dynamic programming algorithms; significance of scores. |
Needleman-Wunsch, Smith-Waterman and variants; extreme value statistics. |
February 17-21 |
Read: Krogh et al., 1995 Krogh 2, 1997 Krogh 3, 1998 Burge-Karlin, 1997 Burge-Karlin, 1998 |
Gene finders: HMM models for locating genes in genomic sequence data. |
Krogh et al. gives an E. coli parser; Krogh 2 and 3 describe HMMgene; Burge-Karlin 1 and 2 describe Genscan. |
March 3-7 |
Read: Durbin and Eddy, Chapters 7 & 8 (as much as possible) |
Phylogeny, especially for protein families. |
Many approaches to phylogeny, which is, again, a computationally hard problem. |
March 7-10 |
Kahn, Qian & Goldstein 2000,
Qian, Goldstein 2002. |
Phylogeny, especially for protein families, especially one use for making multiple sequence alignments more accurate. (The preprint links -- right -- are more directly relevant than the reprint links -- left.) |
Tree based HMM's for m.s.a. and classification of GPCR's (= G-protein coupled receptors). |
GPS #1 | Basic modelling and counting | Due: February 5 |
GPS #2 | TBA | Due: TBA |
GPS #3 | TBA | Due: TBA |
GPS #4 | TBA | Due: TBA |
GPS #5 | TBA | Due: TBA |
Entrez | The main database/server center at NIH |
USC Comp Bio Group | Waterman Server |
Expasy | Home of SwissProt |
SAM at UCSC | A suite of programs for HMM's. |
HMMer online (Inst. Pasteur) | A suite of programs for HMM's. |
Pfam at Washington Univ. |
Other sites in UK, etc. The most recent published description of Pfam. |
GENSCAN at MIT |
This site includes a server for GENSCAN as well as a lot of useful documentation about limitations, etc. |
HMMgene (Copenhagen) |
This site includes a server for HMMgene, trained for vertebrates and C. elegans. Not as much documentation. |
Burset-Guigo Data Sets |
Useful data sets and their successors calibrating gene finders. |
PHYLIP Phylogeny Programs |
A server supported by the Institut Pasteur, Paris. Felsenstein commends them on their ``bravery", since some of these programs are computationally intensive. |
PDB | Protein Data Base |
Workbench | Mainly structural tools |
Search CPAN |
DB of Perl Modules for downloading; check, especially, BioPerl. NOTE: BioPerl 1.2 is available in the 548 resource directory. You can examine it in some detail there and download individual scripts you think you can use. For the full release, go through CPAN. |