Math/Stats 547-8, Winter 2003: Bio Sequence Analysis.

547 (Lecture): MWF, 9:00-10:00 AM, Room 1060 East Hall;
548 (Lab): Tu, 9-10 AM, Room 5631 Medical Sciences II (BICC)

Instructor: Dan Burns

Office: 5834 East Hall

Phone: 763-0152

E-mail: dburns@umich.edu

There is a new class home page, linked here. Not all sidebar links are active yet, however.

This will be the temporary home page for the course; hopefully I will have time to get something more interesting together in a little while. For now, this page will provide a convenient access to the class events which are being scheduled, the assignments and group rosters, and a convenient set of links to (online) papers and web sites which will be of use in the course. For your convenience, here is the first day handout summarizing the course, as well as the course announcement which gives a bit more syllabus detail. Note, however, that I am thinking of modifying a few things over the course of the term due to recent developments.

Group assignments will be available below. First Group Problems groups will be set in class on Wednesday, January 21.

There will be no examinations in the class. There will be a final project which will consist of your studying on a particular subject and making a presentation to the class. This will be a twenty to twenty-five minute Power Point presentation, and will be done in teams of two. There will be a page of suggested topics, though you will be free to choose a topic of your own (it must be approved, however).

Link here to the page for the final project, including suggested topics, etc.


Schedule of Readings and Detailed Syllabus:

Click on a section subject for a relevant link, if available. DE means Durbin and Eddy, et al., ``Biological Sequence Analysis"

January 6-10 Read: DE, Chap. 1;
secs. 11.1-11.2.
Review of linearity of macromolecules; probability background. How do we model randomness of
sequences versus biological
meaning of sequence data?
January 13-17 Read: DE, Chap. 2.1-2;
Chap. 11.2.
Entropy; scoring matrices;
PAM matrices as Markov models.
Entropy as a measure of
``interesting" sequence location.
January 22-24
(No meeting
Jan 20: MLK)
Read: DE, Chap. 11.3;
Chap. 2.3-2.7.
Dynamic programming algorithms;
significance of scores.
Needleman-Wunsch,
Smith-Waterman and variants;
extreme value statistics.
February 17-21 Read: Krogh et al., 1995
Krogh 2, 1997
Krogh 3, 1998
Burge-Karlin, 1997
Burge-Karlin, 1998
Gene finders: HMM models for locating
genes in genomic sequence data.
Krogh et al. gives an E. coli parser;
Krogh 2 and 3 describe HMMgene;
Burge-Karlin 1 and 2 describe Genscan.
March 3-7 Read: Durbin and Eddy, Chapters 7 & 8
(as much as possible)
Phylogeny, especially for protein families. Many approaches to phylogeny,
which is, again, a
computationally hard problem.
March 7-10 Kahn, Qian & Goldstein 2000,
Qian, Goldstein 2002.
Phylogeny, especially for protein families, especially one use for making multiple sequence alignments more accurate. (The preprint links -- right -- are more directly relevant than the reprint links -- left.) Tree based HMM's for m.s.a.
and classification of GPCR's
(= G-protein coupled receptors).

Downloadable Group Problem Sets and Schedule of Group HW Assignments:

Click on a section subject for a relevant link, if available.

GPS #1 Basic modelling and counting Due: February 5
GPS #2 TBA Due: TBA
GPS #3 TBA Due: TBA
GPS #4 TBA Due: TBA
GPS #5 TBA Due: TBA

Some useful web links for the course:

Entrez The main database/server center at NIH
USC Comp Bio Group Waterman Server
Expasy Home of SwissProt
SAM at UCSC A suite of programs for HMM's.
HMMer online (Inst. Pasteur) A suite of programs for HMM's.
Pfam at Washington Univ. Other sites in UK, etc.
The most recent published description of Pfam.
GENSCAN at MIT This site includes a server for GENSCAN
as well as a lot of useful documentation about limitations, etc.
HMMgene (Copenhagen) This site includes a server for HMMgene,
trained for vertebrates and C. elegans.
Not as much documentation.
Burset-Guigo Data Sets Useful data sets and their successors
calibrating gene finders.
PHYLIP Phylogeny Programs A server supported by the Institut Pasteur, Paris.
Felsenstein commends them on their ``bravery",
since some of these programs are computationally intensive.
PDB Protein Data Base
Workbench Mainly structural tools
Search CPAN DB of Perl Modules for downloading;
check, especially, BioPerl.
NOTE: BioPerl 1.2 is available in the 548 resource
directory. You can examine it in some detail there
and download individual scripts you think you can use.
For the full release, go through CPAN.

  • Link with UM Math Department.
  • Link with UM Stats Department.