Design a site like this with WordPress.com
Get started

What is Apache beam?

Apache beam is open source and unified programming language that will help to process stream based and batch based data.

Lets understand each and every word

Unified programming language : This means you can code once and run on different runner ,Runners means like spark cluster or flink cluster

Stream Based data: The data which is continuously flowing ,simple example is cars running on road ..if you want to count them there is not fixed window for it they are runing on road day and night.

This data consist of data like sensor data ,IOT device data,continious feed from solace,MQ,Kafka

Batch Based data: This is finite data and it could be grouped by number or timeline easily. E.g. number of cars going in 1 hour. This kind of data can be generate by any system on daily basis and sent in batches for example trading data for entire day.

Different Runners Supported by Apache Beam

  1. Spark Runner
  2. Flink Runner
  3. Samza
  4. Nemo Runner
  5. Google data flow
  6. hazzle cast runner

You can code with beam in different languages

Below are languages supported in beam

  1. Java
  2. Python
  3. Go

You need to download SDK for each language to code in apache beam, We will be going to see java SDK for our tutorials

Advertisement