A New Approach for Scheduling Tasks and/or Jobs in Big Data Cluster









Abstract

400 hundred million tweets are sent each day, 4.75 billion multimedia content is shared every day on Facebook, and an estimated of hundreds of hours of video are uploaded to YouTube every minute. Moreover, also IOT devices (RFID and WIFI wearable devices) are generating a huge number of data per seconds. During the last two years, the amount of data that has been created is about 90% of the whole data created so far. All these facts require clusters of computers with high specs in order to treat them. Knowing that the prices of computers are continuously dropping from year to another, almost all companies have started their Big Data projects. The return on investment (ROI) of such a project is beneficial to companies in terms of business. Since the advent of Big Data, a lot improvement is been done in order to optimize the usage of the resources (especially RAM) and to reduce the required amount of time needed for running Big Data projects. Still, effort needs to be done for the scheduler for efficiently scheduling the tasks inside the DataNodes of the Big Data Cluster. In this paper, we propose a new approach for scheduling tasks and/or jobs in Big Data Cluster in which mainly focus on optimizing the assignment of tasks to the data nodes by the NameNode. Prominent obtained results proved that our task scheduler outperforms the traditional task scheduler: FIFO Scheduler and Capacity Scheduler available in Big Data open source distributions such as Cloudera [19] and HortonWorks [20].


Modules


Algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask,hadoop Frontend :-python Backend:- MYSQL