BIOLOGICAL SEQUENCE ANALYSIS

Math/Stats 547 (Lecture) and 548 (Lab) MWF, 9-10 (1084 East Hall); Fri, 10-11 (B743 East Hall)

 

Syllabus

Assignments

Lab Worksheets

548 Resources

Web Resources

Term Project

Speaker Schedule

Outside Seminars

Contact Instructor

Instructor: Dan Burns

Office: 5834 East Hall

Phone: 763-0152

E-mail: dburns@umich.edu

This page will provide a convenient access to the class events which are being scheduled, the assignments and group rosters, and a convenient set of links to (online) papers and web sites which will be of use in the course. For your convenience, here is the first day handout summarizing the course, as well as the course announcement which gives a bit more syllabus detail. Note, however, that I am thinking of modifying a few things over the course of the term due to recent developments.

Problem set assignments will be available below (follow link in the left side bar). First problems sets will be due Wednesday, February 4.

There will be no examinations in the class. There will be a final project which will consist of your studying on a particular subject and making a presentation to the class. This will be a twenty to twenty-five minute Power Point presentation, and will be done in teams of two. There will be a page of suggested topics, though you will be free to choose a topic of your own (it must be approved, however).

Link here to the page for the final project, including suggested topics, etc. For previous semesters' syllabi, click here.


Schedule of Readings and Detailed Syllabus:

Click on a section subject for a relevant link, if available. DE means Durbin, Eddy, et al., ``Biological Sequence Analysis"

September 7-9 Read: DE, Chap. 11, secs. 11.1-11.5. Review of linearity of macromolecules; probability background. How do we model randomness of
sequences versus biological
meaning of sequence data?
September 12-16 Read: Lecture Notes;
DE, Chap. 11.2.
Bayes and priors;
Entropy
.
Dirichlet priors; Entropy as a measure of ``interesting" sequence location.
1) Dirichlet details (In the appendices, pp. 24-33.)

September 19-23

Read: Lecture Notes. Entropy;
Aligment;
Scoring Matrices.
1) Entropy details
2) Entropy Data Example (DE)
3) BLOSSUM 50 Scoring Matrix
(more matrices available in 548 Resources)
September 26-30 Read: Lecture Notes.
(They are broken into three parts this week.)
DP algorithms;
significance levels of scores.
Needleman-Wunsch,
Smith-Waterman and variants;
extreme value statistics.
October
3-7
Read: Lecture Notes Extreme values,
Karlin-Altschul and
Arratia-Waterman Statistics
Significance
October
10-14
Read: Lecture Notes

Hidden Markov Models (HMM) 1: Parsing and Training

HMMs I
October
19-21
Read: Lecture Notes

HMM and Multiple Sequence Alignment (MSA)

 

HMMs II
October
24-26
Read: Lecture Notes

Finding MSAs; Examples: ClustalW; Protein Family Profiles

 
October 28   Perl tutorial  
October 31- November 4 Read: Lecture Notes Non-prob methods of phylogeny:
Clustering, distance, parsimony.
Notes: Phylogeny I
March
8-12
(Mar 8 make-up TBA)
Read: Lecture Notes,
Felsenstein-Churchill.
Probabilistic phylogeny (ML estimation);
HMMs and variable site rates of evolution.
Notes: Phylogeny II;
FastDNAML
Man Pages
;
F-C: Variable Site Rates & HMMs.
March
15-19
Read: Lecture Notes;
Qian-Goldstein;
Qian et al. (GPCRs).
Fusing HMMs and phylogeny;
distant homologies.
Tree-HMMs,
T-HMMs and GPCRs (pp. 95-99 of link);
reversed HMM as null model.
March 22-26 Read: Krogh et al., 1995
Krogh 2, 1997
Krogh 3, 1998
Burge-Karlin, 1997
Burge-Karlin, 1998
Haussler Review
(unpublished).
Gene finders: HMM models for locating
genes in genomic sequence data.
Krogh et al. gives an E. coli parser;
Krogh 2 and 3 describe HMMgene;
Burge-Karlin 1 and 2 describe Genscan;
Krogh 3 and Haussler are good surveys.
March
29-31
Reference: Brian Ripley, ``Pattern Recognition and Neural Networks", Camb.UP (1995),
Ch. 5: Feedforward Neural Nets.
Basics of NN's:
Feedforward nets,
supervised training, backpropagation algorithm;
gradient descent minimization.
The ``vanilla" settings for NN's; the complete literature is vast;
few rigorous arguments, very heuristic field.
April 2
Neural nets in promoter recognition: NNPP;
M. Reese, Comps. & Chem., 26 (1998) 51-56.
Time delay NN's;
application to eukaryotic promoter site recognition.
Typical use of NN for pattern recognition, with modification to allow for flexible location for recognition of the ``same" signal.
April 5 Probabilistic version of promoter recognition: McPromoter.
Ohler et al., 1999.
Improvements in McPromoter;
Ohler et al., 2001.
Recognizing promoter and regulatory networks, Church Lab, 2002.
Interpolated Markov chains;
application to eukaryotic promoter site recognition.
Biophysical improvements.
The regulatory network aspect.
Use of higher order Markov chains for pattern recognition, with modification to allow for flexible use of available data: weighted use of shorter and longer context sequences, with (non-probabilistically enforced) weighting of more commonly occuring context sequences.
Ohler's extension of McPromoter to include DNA physics
April 7 Dr. Eric Fauman,
Pfizer Global Research
(Ann Arbor)
Structural Bioinformatics:
Sequence to Structure
Protein residue types,
hierarchy of structure,
structure prediction.
Suggested reading:
Bourne & Weissig
(2003), Chap. 2.
Useful link for today:
RPI/Wadsworth Motifs.
April 9 Dr. Eric Fauman,
Pfizer Global Research
(Ann Arbor)
Drug Targetabillity. Examples: protein kinases, GPCRs.
Suggested reading:
B&W, Chap. 23;
Assessment:
Hopkins & Groom,
Nat Rev Drug Disc 2002.

Commentary (2004).
April 12 Intro to DNA duplex stress and gene promoters. New types of transcription regulatory mechanisms. Example: ilvGMEDA operon in E. coli.
Suggested reading:
Sheridan, Benham and Hatfield.
April 14 Prof. Jens-Christian Meiners,
UM Physics and Biophysics.
DNA duplex: Statistical mechanics,
partition function.
Read: Doi/Edwards: The theory of polymer dynamics, Chapter 2.
If you have time, (heavier duty than usual):
Marko & Siggia, 1995
(caution: huge, PDF image file).
April 16 Stress induced duplex destabilization models and computations. Computing DNA destabilization profiles from sequence data. Suggested reading:
Benham: quick survey 2001, or details: Fye-Benham 1999..
April 19 SIDD profiles,
sequence specific energy functions.
Using theory to predict gene regulation. Nice profiles, showing global effects of SIDD:
Benham, J. Mol. Bio. 255 (1996), 425-434.
April 21 S/MARS Further applications:
large scale structual regions
Benham, et al., J. Mol. Bio. 274 (1997), 181-196.
Goetze, et al., Biochemisrty 42 (2003), 154-166.