[March 7, 2000]
Dear colleagues in decision support:

The first of four meetings of a seminar series on data mining will start on 
Thursday at 10am, in the Meeting Room (C1-217).  The series will emphasize
biological data mining and knowledge discovery.  Biological data mining seems 
to be a more compact area of research than text mining, and some of the
techniques should carry over.

My plan for March 9 is to introduce biological data mining (and, more
generally computational biology or bioinformatics) and present two algorithms:
a dynamic programming algorithm for the sequence alignment problem 
(a simplified version of what is known in bioinformatics as the Smith-Waterman 
algorithm), and a dynamic programming algorithm for parsing gene structure due 
to Thomas Wu.

Next week, I would like to discuss hidden Markov models and their application
to profiling protein families and parsing genes.  In our third third meeting, 
I would like to discuss the use of Bayesian networks in modeling biological 
structures.  As for the fourth meeting, we'll see how things go!

References for March 9 include:
The first ten pages of the (U.S.) DOE Human Genome Project "Primer on Molecular
Genetics," http://www.ornl.gov/hgmis/publicat/primer/intro.html.
Two papers from the November/December 1999 issue of _IEEE Intelligent Systems_
(available by following the appropriate links from the Digital Article Database
Service, http//dads.aub.auc.dk): 
"Datascope: Mining Biological Sequences," by Simon Kasif (pp.38-43), and 
"Gene Discovery in DNA Sequences," by Steven Salzberg (pp.44-48).
The following paper by Thomas D. Wu (in preprint form):
"A Segment-based Dynamic Programming Algorithm for Parsing Gene Structure,"
available at http://cmgm.stanford.edu/~brutlag/Papers/wu96b.pdf.

References for March 16 tentatively include:
The first ten pages of "A Tutorial on Hidden Markov Model and Selected
Applications in Speech Recognition," by Lawrence R. Rabiner, _Proceedings
of the IEEE_, 77, 2 (February 1989), pp.257-286 (available from the Digital
Article Database Service).
"An Introduction to Hidden Markov Models for Biological Sequences,"  by
Anders Krogh, chapter 4 (pp.45-63) in: _Computational Methods in Molecular
Biology_, S.L. Salzberg, D.B. Searls, S.Kasif (Eds.), Elsevier, 1998.

References for March 23 tentatively include:
"Modeling Biological Data and Structure with Probabilistic Networks," by
Simon Kasif and Arthur L. Delcher, chapter 15 (pp.335-352) in:
_Computational Methods in Molecular Biology_, S.L. Salzberg, D.B. Searls, 
S.Kasif (Eds.), Elsevier, 1998.

Camilla has (or will soon have) copies of the papers.  I look forward to your 
participation!

Cheers,
				Marco