Apache beam is open source and unified programming language that will help to process stream based and batch based data.
Lets understand each and every word
Unified programming language : This means you can code once and run on different runner ,Runners means like spark cluster or flink cluster
Stream Based data: The data which is continuously flowing ,simple example is cars running on road ..if you want to count them there is not fixed window for it they are runing on road day and night.
This data consist of data like sensor data ,IOT device data,continious feed from solace,MQ,Kafka
Batch Based data: This is finite data and it could be grouped by number or timeline easily. E.g. number of cars going in 1 hour. This kind of data can be generate by any system on daily basis and sent in batches for example trading data for entire day.
Different Runners Supported by Apache Beam
- Spark Runner
- Flink Runner
- Nemo Runner
- Google data flow
- hazzle cast runner
You can code with beam in different languages
Below are languages supported in beam
You need to download SDK for each language to code in apache beam, We will be going to see java SDK for our tutorials