How to Install Apache Flink: Step-by-Step Tutorial

How to Install Apache Flink: Step-by-Step Guide

Apache Flink is an open-source stream-processing framework designed for real-time data processing applications. Flink’s powerful architecture allows it to perform at scale across clusters with fault tolerance and high throughput. This tutorial walks you through the complete process of installing Apache Flink on a Linux machine.

Prerequisites

Linux-based operating system (Ubuntu, CentOS, Debian, etc.)
Java Development Kit (JDK) 8 or later installed and configured
Basic knowledge of command line and shell usage
Internet connection to download Flink binaries
Optional: Hadoop or other cluster management tools if running Flink on a cluster

Step 1: Verify Java Installation

Apache Flink requires Java to run. First, verify if Java is installed and its version:

java -version

You should see output indicating Java version 8 or newer. If Java is not installed or version is older, install or update it:

On Ubuntu/Debian: sudo apt install openjdk-11-jdk
On CentOS/RHEL: sudo yum install java-11-openjdk-devel

Step 2: Download Apache Flink

Visit the official Apache Flink website (Official site) to download the latest stable release tarball. Use wget for convenience:

wget https://downloads.apache.org/flink/flink-1.18.1/flink-1.18.1-bin-scala_2.12.tgz

Replace the URL with the latest version link as needed.

Step 3: Extract the Archive

Extract the tarball to your preferred directory, for example /opt:

sudo tar -xzvf flink-1.18.1-bin-scala_2.12.tgz -C /opt

This unpacks the Flink binaries and libraries into /opt/flink-1.18.1.

Step 4: Set Environment Variables

To run Flink easily, add it to your PATH. You can do this temporarily or permanently.

Temporary (in current terminal session):

export FLINK_HOME=/opt/flink-1.18.1
export PATH=$PATH:$FLINK_HOME/bin

Permanent (add to ~/.bashrc or ~/.bash_profile):

echo "export FLINK_HOME=/opt/flink-1.18.1" >> ~/.bashrc
echo "export PATH=\$PATH:\$FLINK_HOME/bin" >> ~/.bashrc
source ~/.bashrc

Step 5: Validate Flink Installation

Check the Flink command line interface is working, run:

flink --version

This should print the Flink version.

Step 6: Start Flink

To start Flink in standalone mode (single node), start the cluster:

start-cluster.sh

Verify Flink is running by opening the web UI at http://localhost:8081 in your browser.

Stopping Flink

To stop the Flink cluster:

stop-cluster.sh

Troubleshooting Tips

If flink command is not found, ensure your FLINK_HOME and PATH variables are correctly set.
Check Java version compatibility if Flink fails to start.
Examine logs in $FLINK_HOME/log for errors or warnings.
If port 8081 is in use, configure the web UI port in $FLINK_HOME/conf/flink-conf.yaml.

Summary Checklist

Installed Java 8 or newer
Downloaded and extracted Apache Flink binaries
Set environment variables for convenience
Verified Flink CLI and started Flink cluster
Accessed Flink web UI on port 8081

For advanced setups and configuration, consider deploying Flink on a Hadoop YARN cluster or Kubernetes. Learn more about cluster setup in our related tutorials such as How to Configure Apache Zookeeper where coordination services complement distributed systems like Flink.

Apache Flink empowers developers to build stateful streaming applications with ease. With this installation guide, you’re ready to explore its powerful data processing capabilities.