158 / 2016-11-18 16:54:46
DFPS: Distributed FP-growth Algorithm Based on Spark
frequent itemset mining; association rules mining; FP-growth; Spark; RDD; Big Data, distributed algorithm
全文录用
少总 陈 / 东华大学
秀金 石 / 东华大学
辉 杨 / 东华大学
Frequent Itemset Mining (FIM) is the most important and time-consuming step of association rules mining. With the increment of data scale, many efficient single-machine algorithms for FIM, such as FP-growth and Apriori, cannot accomplish the computing tasks within reasonable time. With the limitation of single-machine methods, researchers presented some distributed algorithms based on MapReduce and Spark, such as PFP and YAFIM. Nevertheless, the heavy disk I/O cost at each MapReduce operation makes PFP not efficient enough. YAFIM needs to generate candidate frequent itemsets in each iterative step. It makes YAFIM time-consuming. And if the scale of data is large enough, YAFIM algorithm will not work due to the limitation of memory since the candidate frequent itemsets need to be stored in the memory. And the size of candidate itemsets is very large especially facing the massive data. In this work, we propose a distributed FP-growth algorithm based on Spark, we call it DFPS. DFPS partitions computing tasks in such a way that each computing node builds the conditional FP-tree and adopts a pattern fragment growth method to mine the frequent itemsets independently. DFPS doesn’t need to pass messages between nodes during mining frequent itemsets. Our performance study shows that DFPS algorithm is more excellent than YAFIM, especially when the length of transactions is long, the number of items is large and the data is massive. And DFPS has an excellent scalability. The experimental results show that DFPS is more than 10 times faster than YAFIM for T10I4D100K dataset and Pumsb_star dataset.
重要日期
  • 会议日期

    03月25日

    2017

    03月26日

    2017

  • 11月10日 2016

    初稿截稿日期

  • 11月20日 2016

    初稿录用通知日期

  • 11月30日 2016

    终稿截稿日期

  • 03月26日 2017

    注册截止日期

主办单位
IEEE Beijing Section
联系方式
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询