What is Apache beam?

old bright lighthouse on sandy shore in daytime

Apache beam is open source and unified programming language that will help to process stream based and batch based data.

Lets understand each and every word

Unified programming language : This means you can code once and run on different runner ,Runners means like spark cluster or flink cluster

Stream Based data: The data which is continuously flowing ,simple example is cars running on road ..if you want to count them there is not fixed window for it they are runing on road day and night.

This data consist of data like sensor data ,IOT device data,continious feed from solace,MQ,Kafka

Batch Based data: This is finite data and it could be grouped by number or timeline easily. E.g. number of cars going in 1 hour. This kind of data can be generate by any system on daily basis and sent in batches for example trading data for entire day.

Different Runners Supported by Apache Beam

  1. Spark Runner
  2. Flink Runner
  3. Samza
  4. Nemo Runner
  5. Google data flow
  6. hazzle cast runner

You can code with beam in different languages

Below are languages supported in beam

  1. Java
  2. Python
  3. Go

You need to download SDK for each language to code in apache beam, We will be going to see java SDK for our tutorials

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: