Azure Databricks Overview

Microsoft Azure cloud services enable enterprises to manage data at scale in the cloud. That also open massive possibilities for predictive analytics, AI, and real-time applications. Apache Spark has become the platform of choice for building these applications but deploying…

Performance improvement of map reduce through new Hadoop block placement algorithm

HDFS estimates the network bandwidth between two nodes by their distance. The distance from a node to its parent node is assumed to be one. A shorter distance between two nodes means that the greater bandwidth they can utilize to…

Amazon Kinesis – Managed service for real-time data processing

Amazon Kinesis Firehose is a managed service to load real-time streaming data to Amazon Simple Storage Service (S3), Redshift or Elastic search Service (ES). Firehose is part of the Amazon Kinesis streaming data platform, along with Amazon Kinesis Streams and Amazon Kinesis Analytics. There…