COLLOQUIUM Department of Computer Science and Engineering University of South Carolina Association Pattern Discovery: Algorithms and Applications in Bioinformatics Hui Xiong Department of Computer Science and Engineering University of Minnesota Date: March 4, 2005 Time: 3:30-4:30PM Place: Swearingen 1A03 (Faculty Lounge) Abstract The problem of association pattern mining is to develop techniques for finding groups of highly-correlated objects from massive data. This problem is important for various application domains, such as bioinformatics, market basket study, and medical data analysis. A large body of association mining work was motivated by the difficulty of efficiently identifying highly correlated objects using traditional statistical correlation measures. This has led to the use of alternative interest measures, such as support and confidence, despite the lack of a precise relationship between these new interest measures and statistical correlation measures. However, this approach tends to generate too many spurious patterns involving objects which are poorly correlated. In this talk, I provide a precise relationship between Pearson's correlation coefficient and the support measure. I also present an efficient algorithm called TAPER to identify highly-correlated pairs of objects by contributing two algorithmic ideas: the monotonic upper bound of Pearson's correlation coefficient and novel pruning of candidates based on the ordering of object-pairs containing a common object. While TAPER can efficiently identify highly-correlated pairs of objects, it has difficulty in identifying highly-correlated objects beyond pairs. For this purpose, I introduce a framework for mining hyperclique patterns, which are groups of strongly correlated objects. Indeed, every pair of objects within a hyperclique pattern is guaranteed to have an uncentered Pearson's correlation coefficient above a certain level. Finally, I demonstrate an application of the hyperclique pattern discovery approach for identifying functional modules from protein complexes. Hui Xiong is a Ph.D. candidate in the Department of Computer Science and Engineering at the University of Minnesota. He received the B.E. degree in Automation from the University of Science and Technology of China in 1995 and the M.S. degree in Computer Science from the National University of Singapore in 2000. His general areas of research are data mining, databases, and statistical computing with applications in bioinformatics and database security. He has published research papers in refereed journals and conferences, such as TKDE, SIGKDD, SDM, ICDM, and PSB. He is a student member of the IEEE Computer Society and the ACM.