How to Configure Apache Zookeeper: A Step-by-Step Guide

Apache Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. It is a crucial component in many distributed systems, including Hadoop and Kafka. In this detailed tutorial, we’ll walk you through installing and configuring Zookeeper for single-node usage and how to set up an ensemble for high availability.

Prerequisites

Basic understanding of distributed systems concepts.
Linux-based server (Ubuntu/Debian/CentOS) or compatible environment.
Java installed on the server, as Zookeeper runs on the JVM.
Root or sudo access for installation and configuration.
Optional: Multiple servers if setting up a Zookeeper ensemble cluster.

Step 1: Installing Java

Zookeeper requires Java 8 or later. To install OpenJDK 11 (recommended) on Ubuntu/Debian, run:

sudo apt update
sudo apt install openjdk-11-jdk
java -version

For CentOS/RHEL use:

sudo yum install java-11-openjdk-devel
java -version

Step 2: Download and Install Apache Zookeeper

Go to the Apache Zookeeper download page (Official site) and download the latest stable release tarball. Alternatively, use wget on your server:

wget https://downloads.apache.org/zookeeper/zookeeper-3.8.1/apache-zookeeper-3.8.1-bin.tar.gz

Extract the archive:

tar -xvzf apache-zookeeper-3.8.1-bin.tar.gz
sudo mv apache-zookeeper-3.8.1-bin /opt/zookeeper

Step 3: Configure Zookeeper

Zookeeper configurations reside mainly in the conf directory. Copy the sample configuration file as the working configuration:

cd /opt/zookeeper
cp conf/zoo_sample.cfg conf/zoo.cfg

Edit conf/zoo.cfg using your favorite editor:

nano conf/zoo.cfg

The main properties to configure:

tickTime=2000: Basic time unit in milliseconds.
dataDir=/var/lib/zookeeper: Directory where Zookeeper stores its data. Ensure this directory exists and is writable.
clientPort=2181: The port clients use to connect to Zookeeper.
initLimit=10 and syncLimit=5: Limits for follower connection and synchronization to the leader.

Create the data directory:

sudo mkdir -p /var/lib/zookeeper
sudo chown -R $(whoami) /var/lib/zookeeper

Step 4: Running Zookeeper as a Standalone Server

Use the built-in scripts to start Zookeeper quickly for testing and development:

bin/zkServer.sh start
bin/zkServer.sh status

To stop the server later:

bin/zkServer.sh stop

Step 5: Setting Up a Zookeeper Ensemble (Cluster Configuration)

Zookeeper’s real power comes in high availability setups using ensembles composed of odd-numbered servers (usually 3, 5, or 7).

Modify `zoo.cfg` for Ensemble Mode

Add the server definitions where each has an id, hostname/IP, and two ports. Example for a 3-node cluster:

server.1=zookeeper1.example.com:2888:3888
server.2=zookeeper2.example.com:2888:3888
server.3=zookeeper3.example.com:2888:3888

2888 is for leader-follower communication, and 3888 is for leader election.

Assign Each Node Its Unique ID

On each node, create a file named myid inside the dataDir directory with a single number representing the node’s ID (e.g., 1, 2, or 3):

echo 1 > /var/lib/zookeeper/myid

Adjust the number for each respective node accordingly.

Step 6: Firewall and Network Configuration

Make sure the necessary ports are open on all ensemble members:

2181 – Client connections
2888 – Follower connections
3888 – Leader election

Adjust firewall rules with iptables, firewalld, or cloud provider security groups accordingly.

Step 7: Starting the Ensemble Cluster

Start Zookeeper on each node:

bin/zkServer.sh start
bin/zkServer.sh status

Use the status command to confirm the nodes’ roles (Leader/Follower).

Troubleshooting Tips

If the server fails to start, check the logs directory for details.
Check that myid matches the server number defined in zoo.cfg.
Ensure Java version compatibility and environment variables are set.
Validate network connectivity between ensemble nodes and ports.

Summary Checklist

Installed Java 8 or later.
Downloaded and extracted Zookeeper binaries.
Configured zoo.cfg and created dataDir.
Tested standalone Zookeeper server startup.
Configured ensemble nodes and the myid files for clustering.
Opened required ports for inter-node communication.
Started the ensemble and verified the status of each node.

For detailed insights into setting up big data environments, you may also find our guide on how to install HBase useful for integrating with Zookeeper-driven clusters.

Following these steps will put you on the path to confidently configuring and managing Apache Zookeeper to meet your distributed system needs.