Distributed Mining of Frequent Patterns in Big Data by Hybrid Strategies








Abstract

Frequent pattern mining is an important problem in data mining research. As the big data time comes, the size of database increases sharply. How to mine frequent patterns efficiently from big transaction databases is always a challenge. An approach to the issue is to parallelize the mining algorithm. However, traditional parallel algorithms have issues in balancing workloads and in recovering from failures. Thus, a novel MapReduce-based parallel algorithm is proposed with three contributions. First, a hybrid mining strategy is proposed, which automatically shifts from breadth-first mining to depth-first mining and performs breadth-first mining and depth-first mining simultaneously. Second, a hybrid vertical data format namely mixset is applied in breadth-first mining, and a new method for transforming a mixset back to a horizontal data representation is proposed which facilitates depth-first mining. Third, strategies are proposed to reduce the number of candidates in breadth-first mining and to facilitate depth-first mining that avoids generating candidates, which saves both space and time. The experiments show that the proposed algorithm outperforms the existing MapReduce-based algorithms, and is quite scalable.


Modules


Algorithms

Data Mining algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask,hadoop Frontend :-python Backend:- MYSQL