[March 7, 2000] Dear colleagues in decision support: The first of four meetings of a seminar series on data mining will start on Thursday at 10am, in the Meeting Room (C1-217). The series will emphasize biological data mining and knowledge discovery. Biological data mining seems to be a more compact area of research than text mining, and some of the techniques should carry over. My plan for March 9 is to introduce biological data mining (and, more generally computational biology or bioinformatics) and present two algorithms: a dynamic programming algorithm for the sequence alignment problem (a simplified version of what is known in bioinformatics as the Smith-Waterman algorithm), and a dynamic programming algorithm for parsing gene structure due to Thomas Wu. Next week, I would like to discuss hidden Markov models and their application to profiling protein families and parsing genes. In our third third meeting, I would like to discuss the use of Bayesian networks in modeling biological structures. As for the fourth meeting, we'll see how things go! References for March 9 include: The first ten pages of the (U.S.) DOE Human Genome Project "Primer on Molecular Genetics," http://www.ornl.gov/hgmis/publicat/primer/intro.html. Two papers from the November/December 1999 issue of _IEEE Intelligent Systems_ (available by following the appropriate links from the Digital Article Database Service, http//dads.aub.auc.dk): "Datascope: Mining Biological Sequences," by Simon Kasif (pp.38-43), and "Gene Discovery in DNA Sequences," by Steven Salzberg (pp.44-48). The following paper by Thomas D. Wu (in preprint form): "A Segment-based Dynamic Programming Algorithm for Parsing Gene Structure," available at http://cmgm.stanford.edu/~brutlag/Papers/wu96b.pdf. References for March 16 tentatively include: The first ten pages of "A Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition," by Lawrence R. Rabiner, _Proceedings of the IEEE_, 77, 2 (February 1989), pp.257-286 (available from the Digital Article Database Service). "An Introduction to Hidden Markov Models for Biological Sequences," by Anders Krogh, chapter 4 (pp.45-63) in: _Computational Methods in Molecular Biology_, S.L. Salzberg, D.B. Searls, S.Kasif (Eds.), Elsevier, 1998. References for March 23 tentatively include: "Modeling Biological Data and Structure with Probabilistic Networks," by Simon Kasif and Arthur L. Delcher, chapter 15 (pp.335-352) in: _Computational Methods in Molecular Biology_, S.L. Salzberg, D.B. Searls, S.Kasif (Eds.), Elsevier, 1998. Camilla has (or will soon have) copies of the papers. I look forward to your participation! Cheers, Marco