BIOLOGICAL SEQUENCE ANALYSIS

Math/Stats 547 (Lecture) and 548 (Lab) MWF, 9-10 (1084 East Hall); Fri, 10-11 (B743 East Hall)

 

Syllabus

Assignments

Lab Worksheets

548 Resources

Web Resources

Term Project

Speaker Schedule

Outside Seminars

Contact Instructor

Instructor: Dan Burns

Office: 5834 East Hall

Phone: 763-0152

E-mail: dburns@umich.edu


Schedule of Readings and Detailed Syllabus: latter part of W '03.

Click on a section subject for a relevant link, if available. DE means Durbin, Eddy, et al., ``Biological Sequence Analysis". For the most current syllabus, use the navigation bar (left).

March 3-7,
2003
Read: Durbin and Eddy, Chapters 7 & 8
(as much as possible)
Phylogeny, especially for protein families. Many approaches to phylogeny,
which is, again, a
computationally hard problem.
March 7-10,
2003
Kahn, Qian & Goldstein 2000,
Qian, Goldstein 2002.
Phylogeny, especially for protein families, especially one use for making multiple sequence alignments more accurate. (The preprint links -- right -- are more directly relevant than the reprint links -- left.) Tree based HMM's for m.s.a.
and classification of GPCR's
(= G-protein coupled receptors).
March
12-19
(March 14 canceled),
2003
Read: DE, Chap. 7 (parsimony);
Chap. 8 (ML);
HMM: Felsenstein-Churchill 1996,
Mitchison-Durbin, 1995 (not linkable; will be distributed in class).
Phylogeny: ML; Parsimony;
HMM's in phylogeny.
Parsimony most used method for tree estimation; HMM's vary substitution rates across sequence positions.
March
21-24,
2003
Reference: Brian Ripley, ``Pattern Recognition and Neural Networks", Camb.UP (1995),
Ch. 5: Feedforward Neural Nets.
Basics of NN's:
Feedforward nets,
supervised training, backpropagation algorithm;
gradient descent minimization.
The ``vanilla" settings for NN's; the complete literature is vast;
few rigorous arguments, very heuristic field.
March
26-28,
2003
Neural nets in promoter recognition: NNPP;
M. Reese, Comps. & Chem., 26 (1998) 51-56.
Time delay NN's;
application to eukaryotic promoter site recognition.
Typical use of NN for pattern recognition, with modification to allow for flexible location for recognition of the ``same" signal.
March
31, 2003
Probabilistic version of promoter recognition: McPromoter.
Ohler et al., 1999.
Interpolated Markov chains;
application to eukaryotic promoter site recognition.
Use of higher order Markov chains for pattern recognition, with modification to allow for flexible use of available data: weighted use of shorter and longer context sequences, with (non-probabilistically enforced) weighting of more commonly occuring context sequences.
April
2 - 4,
2003
Improvements in McPromoter;
Ohler et al., 2001.
Incorporating biophysical properties of sequences. Ohler's extension of McPromoter to include DNA physics;
intro to duplex stress and gene promoters (after Benham et al.).
April
4 - 14,
2003
Term Project Presentations:
Good luck!
Great variety of topics. Visitors welcome:
schedule of speakers.
April
14 - 17,
2003
End of Term:
Sonnhammer et al., 1998.
Martelli et al. (2002).
Trans-membrane proteins: recognizing helices and beta barrels. Using HMM's for structural feature recognition.
(Papers taken from the suggested project topics.)