Amazon Kinesis – Managed service for real-time data processing

Amazon Kinesis Firehose is a managed service to load real-time streaming data to Amazon Simple Storage Service (S3), Redshift or Elastic search Service (ES). Firehose is part of the Amazon Kinesis streaming data platform, along with Amazon Kinesis Streams and Amazon Kinesis Analytics. There is no need to write applications or manage resources with Firehose. Applications can easily be configured to send data to Firehose and it automatically loads the data on the destination specified.

It enables near real-time analytics with existing business intelligence tools and dashboards. It automatically scales to match the throughput of your data and requires no ongoing administration. It can even compress and encrypt the data before loading, thus, minimizing the amount of storage used at the destination and increasing security.

Firehose delivery stream can easily be created and configured from the AWS Management Console, and data sending to the stream can be started in just a few minutes.

kniesis.png

With Amazon Kinesis Firehose, you only pay for the amount of data you transmit through the service with no minimum fee or setup cost required.

Key Concepts and Terminology:

Following terms helps in understanding and using the Amazon Firehose.

  1. Firehose delivery stream

Users creates a Firehose delivery stream in order to send data to it.

  1. Record

A Record is the data of interest submitted by the user to the delivery stream. It can be up to 1000KB in size.

  1. Data producers

Data producers are the applications that generate streaming data. For example, A web application generating log data and sending to Delivery Stream, a web crawler sending crawled data etc.

  1. Buffer Size and Buffer Interval

Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering to destinations. Buffer Size is in MBs and Buffer Interval is in seconds.

  1. Amazon Kinesis Agents: It is a Java application for the linux-based servers, that monitors files such as log files and continuously collect and send data to your delivery stream.

The example usage of the configurations for the Amazon S3, Redshift and Elastic Search with the argument reference is available here.

1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: