Math/Stats 547-8, Winter 2003. Resource Page.

Math/Stats 547-8, Winter 2003: Bio Sequence Analysis.

547 (Lecture): MWF, 9:00-10:00 AM, Room 1060 East Hall;
548 (Lab): Tu, 9-10 AM, Room 5631 Medical Sciences II (BICC)

Instructor: Dan Burns

Office: 5834 East Hall

Phone: 763-0152

There is a new class home page, linked here. Not all sidebar links are active yet, however.

This will be the temporary home page for the course; hopefully I will have time to get something more interesting together in a little while. For now, this page will provide a convenient access to the class events which are being scheduled, the assignments and group rosters, and a convenient set of links to (online) papers and web sites which will be of use in the course. For your convenience, here is the first day handout summarizing the course, as well as the course announcement which gives a bit more syllabus detail. Note, however, that I am thinking of modifying a few things over the course of the term due to recent developments.

Group assignments will be available below. First Group Problems groups will be set in class on Wednesday, January 21.

There will be no examinations in the class. There will be a final project which will consist of your studying on a particular subject and making a presentation to the class. This will be a twenty to twenty-five minute Power Point presentation, and will be done in teams of two. There will be a page of suggested topics, though you will be free to choose a topic of your own (it must be approved, however).

Link here to the page for the final project, including suggested topics, etc.

Schedule of Readings and Detailed Syllabus:

Click on a section subject for a relevant link, if available. DE means Durbin and Eddy, et al., ``Biological Sequence Analysis"


January 6-10	Read: DE, Chap. 1; secs. 11.1-11.2.	Review of linearity of macromolecules; probability background.	How do we model randomness of sequences versus biological meaning of sequence data?
January 13-17	Read: DE, Chap. 2.1-2; Chap. 11.2.	Entropy; scoring matrices; PAM matrices as Markov models.	Entropy as a measure of ``interesting" sequence location.
January 22-24 (No meeting Jan 20: MLK)	Read: DE, Chap. 11.3; Chap. 2.3-2.7.	Dynamic programming algorithms; significance of scores.	Needleman-Wunsch, Smith-Waterman and variants; extreme value statistics.
February 17-21	Read: Krogh et al., 1995 Krogh 2, 1997 Krogh 3, 1998 Burge-Karlin, 1997 Burge-Karlin, 1998	Gene finders: HMM models for locating genes in genomic sequence data.	Krogh et al. gives an E. coli parser; Krogh 2 and 3 describe HMMgene; Burge-Karlin 1 and 2 describe Genscan.
March 3-7	Read: Durbin and Eddy, Chapters 7 & 8 (as much as possible)	Phylogeny, especially for protein families.	Many approaches to phylogeny, which is, again, a computationally hard problem.
March 7-10	Kahn, Qian & Goldstein 2000, Qian, Goldstein 2002.	Phylogeny, especially for protein families, especially one use for making multiple sequence alignments more accurate. (The preprint links -- right -- are more directly relevant than the reprint links -- left.)	Tree based HMM's for m.s.a. and classification of GPCR's (= G-protein coupled receptors).

Downloadable Group Problem Sets and Schedule of Group HW Assignments:

Click on a section subject for a relevant link, if available.


GPS #1	Basic modelling and counting	Due: February 5
GPS #2	TBA	Due: TBA
GPS #3	TBA	Due: TBA
GPS #4	TBA	Due: TBA
GPS #5	TBA	Due: TBA

Some useful web links for the course:


Entrez	The main database/server center at NIH
USC Comp Bio Group	Waterman Server
Expasy	Home of SwissProt
SAM at UCSC	A suite of programs for HMM's.
HMMer online (Inst. Pasteur)	A suite of programs for HMM's.
Pfam at Washington Univ.	Other sites in UK, etc. The most recent published description of Pfam.
GENSCAN at MIT	This site includes a server for GENSCAN as well as a lot of useful documentation about limitations, etc.
HMMgene (Copenhagen)	This site includes a server for HMMgene, trained for vertebrates and C. elegans. Not as much documentation.
Burset-Guigo Data Sets	Useful data sets and their successors calibrating gene finders.
PHYLIP Phylogeny Programs	A server supported by the Institut Pasteur, Paris. Felsenstein commends them on their ``bravery", since some of these programs are computationally intensive.
PDB	Protein Data Base
Workbench	Mainly structural tools
Search CPAN	DB of Perl Modules for downloading; check, especially, BioPerl. NOTE: BioPerl 1.2 is available in the 548 resource directory. You can examine it in some detail there and download individual scripts you think you can use. For the full release, go through CPAN.

Link with UM Math Department.

Link with UM Stats Department.

Math/Stats 547-8, Winter 2003: Bio Sequence Analysis. 547 (Lecture): MWF, 9:00-10:00 AM, Room 1060 East Hall; 548 (Lab): Tu, 9-10 AM, Room 5631 Medical Sciences II (BICC)

There is a new class home page, linked here. Not all sidebar links are active yet, however.

Schedule of Readings and Detailed Syllabus:

Downloadable Group Problem Sets and Schedule of Group HW Assignments:

Some useful web links for the course:

Math/Stats 547-8, Winter 2003: Bio Sequence Analysis.

547 (Lecture): MWF, 9:00-10:00 AM, Room 1060 East Hall;
548 (Lab): Tu, 9-10 AM, Room 5631 Medical Sciences II (BICC)