Federated Query processing for Big Data in Data Science









Abstract

As the number of databases continues to grow data scientists need to use data from different sources to run machine learning algorithms for analysis. Data science results depend upon the quality of data been extracted. The objective of this research paper is to implement a federated query processing framework which extracts data from different data sources and stores the result datasets in a common in-memory data format. This helps data scientists to perform their analysis and execute machine learning algorithms using different data engines without having to convert the data into their native data format and improve the performance.


Modules


Algorithms

Machine learning algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask,hadoop Frontend :-python Backend:- MYSQL