Microsoft Azure Data Lake Store: an introduction

What is Data Lake? A data lake is a storage repository that holds a vast amount of raw data in its original format to apply analytics and run big data analysis. Data lake handles the three Vs of big data (Volume,…

Performance improvement of map reduce through new Hadoop block placement algorithm

HDFS estimates the network bandwidth between two nodes by their distance. The distance from a node to its parent node is assumed to be one. A shorter distance between two nodes means that the greater bandwidth they can utilize to…