Monday, November 12, 2012 - 12:30 pm
SWGN 3A75
PhD Defense: Yan Zhang Frequent Itemset Mining (FIM) is a data-mining task that is used to find frequently occurring subsets amongst a database of itemsets. FIM is used in many applications, including those in machine learning and computational biology. In this work, we explore novel hardware architectures to accelerate FIM using multiple Field Programmable Gate Arrays as application-specific coprocessors. In general, FIM is a challenging application to accelerate because it is a data intensive computation and its performance is limited by the available memory bandwidth, and previous work in this area has yielded disappointing results. We develop an efficient hardware accelerator based on ECLAT algorithm. Besides, our approach offers three key advantages to previous efforts. First, we achieve high scalability by dynamically scheduling tasks onto multiple accelerators. Second, we developed a compression scheme for intermediate results and store them onto an on-chip scratchpad memory, significantly reducing the number of off-chip memory accesses. Third, we developed a second data compression scheme for the input data to reduce the total volume of data exchanged over the off-chip memory interface. This compression scheme leverages the bitvector data representation by using a lossless logic minimization-based compression technique that makes single-cycle decoding possible using a novel hardware decoder. Our FPGA coprocessor achieves 29-38 X speedup compared to an optimized x86 implementation. Intermediate compression on scratchpad achieves an additional 2 X speedup, and source bitvector compression achieves an additionally 20-30% speedup.