How to Install Apache Flink: Step-by-Step Guide
How to Install Apache Flink: Step-by-Step Guide
Apache Flink is an open-source stream-processing framework designed for real-time data processing applications. Flink’s powerful architecture allows it to perform at scale across clusters with fault tolerance and high throughput. This tutorial walks you through the complete process of installing Apache Flink on a Linux machine.
Prerequisites
- Linux-based operating system (Ubuntu, CentOS, Debian, etc.)
- Java Development Kit (JDK) 8 or later installed and configured
- Basic knowledge of command line and shell usage
- Internet connection to download Flink binaries
- Optional: Hadoop or other cluster management tools if running Flink on a cluster
Step 1: Verify Java Installation
Apache Flink requires Java to run. First, verify if Java is installed and its version:
java -version
You should see output indicating Java version 8 or newer. If Java is not installed or version is older, install or update it:
- On Ubuntu/Debian:
sudo apt install openjdk-11-jdk - On CentOS/RHEL:
sudo yum install java-11-openjdk-devel
Step 2: Download Apache Flink
Visit the official Apache Flink website (Official site) to download the latest stable release tarball. Use wget for convenience:
wget https://downloads.apache.org/flink/flink-1.18.1/flink-1.18.1-bin-scala_2.12.tgz
Replace the URL with the latest version link as needed.
Step 3: Extract the Archive
Extract the tarball to your preferred directory, for example /opt:
sudo tar -xzvf flink-1.18.1-bin-scala_2.12.tgz -C /opt
This unpacks the Flink binaries and libraries into /opt/flink-1.18.1.
Step 4: Set Environment Variables
To run Flink easily, add it to your PATH. You can do this temporarily or permanently.
- Temporary (in current terminal session):
export FLINK_HOME=/opt/flink-1.18.1 export PATH=$PATH:$FLINK_HOME/bin - Permanent (add to
~/.bashrcor~/.bash_profile):echo "export FLINK_HOME=/opt/flink-1.18.1" >> ~/.bashrc echo "export PATH=\$PATH:\$FLINK_HOME/bin" >> ~/.bashrc source ~/.bashrc
Step 5: Validate Flink Installation
Check the Flink command line interface is working, run:
flink --version
This should print the Flink version.
Step 6: Start Flink
To start Flink in standalone mode (single node), start the cluster:
start-cluster.sh
Verify Flink is running by opening the web UI at http://localhost:8081 in your browser.
Stopping Flink
To stop the Flink cluster:
stop-cluster.sh
Troubleshooting Tips
- If
flinkcommand is not found, ensure yourFLINK_HOMEandPATHvariables are correctly set. - Check Java version compatibility if Flink fails to start.
- Examine logs in
$FLINK_HOME/logfor errors or warnings. - If port 8081 is in use, configure the web UI port in
$FLINK_HOME/conf/flink-conf.yaml.
Summary Checklist
- Installed Java 8 or newer
- Downloaded and extracted Apache Flink binaries
- Set environment variables for convenience
- Verified Flink CLI and started Flink cluster
- Accessed Flink web UI on port 8081
For advanced setups and configuration, consider deploying Flink on a Hadoop YARN cluster or Kubernetes. Learn more about cluster setup in our related tutorials such as How to Configure Apache Zookeeper where coordination services complement distributed systems like Flink.
Apache Flink empowers developers to build stateful streaming applications with ease. With this installation guide, you’re ready to explore its powerful data processing capabilities.
