COLLOQUIUM
Department of Computer Science and Engineering
University of South Carolina
 
Association Pattern Discovery: Algorithms and Applications in Bioinformatics

Hui Xiong

Department of Computer Science and Engineering
University of Minnesota

Date: March 4, 2005
Time: 3:30-4:30PM
Place: Swearingen 1A03 (Faculty Lounge)

Abstract

The problem of association pattern mining is to develop techniques for
finding groups of highly-correlated objects from massive data. This
problem is important for various application domains, such as
bioinformatics, market basket study, and medical data analysis. A
large body of association mining work was motivated by the difficulty
of efficiently identifying highly correlated objects using traditional
statistical correlation measures. This has led to the use of
alternative interest measures, such as support and confidence, despite
the lack of a precise relationship between these new interest measures
and statistical correlation measures. However, this approach tends to
generate too many spurious patterns involving objects which are poorly
correlated.

In this talk, I provide a precise relationship between Pearson's
correlation coefficient and the support measure. I also present an
efficient algorithm called TAPER to identify highly-correlated pairs
of objects by contributing two algorithmic ideas: the monotonic upper
bound of Pearson's correlation coefficient and novel pruning of
candidates based on the ordering of object-pairs containing a common
object. While TAPER can efficiently identify highly-correlated pairs
of objects, it has difficulty in identifying highly-correlated objects
beyond pairs. For this purpose, I introduce a framework for mining
hyperclique patterns, which are groups of strongly correlated
objects. Indeed, every pair of objects within a hyperclique pattern is
guaranteed to have an uncentered Pearson's correlation coefficient
above a certain level. Finally, I demonstrate an application of the
hyperclique pattern discovery approach for identifying functional
modules from protein complexes.

Hui Xiong is a Ph.D. candidate in the Department of Computer
Science and Engineering at the University of Minnesota. He received
the B.E. degree in Automation from the University of Science and
Technology of China in 1995 and the M.S. degree in Computer Science
from the National University of Singapore in 2000. His general areas
of research are data mining, databases, and statistical computing with
applications in bioinformatics and database security. He has published
research papers in refereed journals and conferences, such as TKDE,
SIGKDD, SDM, ICDM, and PSB. He is a student member of the IEEE
Computer Society and the ACM.