How to Monitor Cassandra: A Complete Guide

Apache Cassandra (Official site) is a powerful distributed NoSQL database designed for handling large amounts of data across many commodity servers. Monitoring Cassandra effectively is crucial to ensure it performs optimally and remains reliable in production environments. This tutorial guides you through the essential steps and tools to monitor Cassandra databases.

Prerequisites

A running Cassandra cluster (any recent version).
Basic understanding of Cassandra architecture and terms like nodes, keyspaces, and tables.
Administrative access to servers where Cassandra nodes run.
Installed tools such as Prometheus and Grafana for metrics visualization (optional but recommended).

Step 1: Understand Cassandra Metrics to Monitor

Cassandra exposes many metrics via Java Management Extensions (JMX). Key metrics categories to monitor include:

Node health: Uptime, load, and status
Read/write latency: Average and percentile latencies for read and write operations
Compaction: Number and time of compactions, pending compactions
Garbage collection (GC) activity: Frequency and duration of GC pauses that impact performance
Pending tasks: Reads, writes, hints, and repair tasks queued
Thread pool metrics: Active, pending, and completed tasks for read and write pools
Error metrics: Timeouts, failures, dropped messages

Step 2: Enable JMX and Access Cassandra Metrics

Cassandra uses JMX to expose metrics. By default, it binds to port 7199. To interact with these metrics:

Ensure JMX is enabled on your Cassandra nodes (usually enabled by default).
Use tools like jconsole or nodetool to connect to the JMX port.

nodetool commands like nodetool info, nodetool compactionstats, and nodetool tpstats provide quick insights from the command line.

Step 3: Use Monitoring Tools to Collect and Visualize Metrics

For production environments, automated collection, alerting, and visualization are key. Common setups include:

Prometheus + JMX Exporter: Use the JMX Exporter (Official site) to expose JMX metrics as Prometheus metrics. Prometheus then scrapes these metrics periodically.
Grafana: Connect Grafana to Prometheus to create dashboards visualizing metrics such as latency, compaction backlog, and node health.
DataStax OpsCenter: A commercial monitoring tool offering detailed Cassandra monitoring dashboards, alerts, and management.

Example: Setting up JMX Exporter with Prometheus

# 1. Download JMX exporter jar
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar

# 2. Create a configuration YAML for metrics you want to scrape
# For example, cassandra.yml

# 3. Modify Cassandra startup script to add the Java agent
JAVA_OPTS="$JAVA_OPTS -javaagent:/path/to/jmx_prometheus_javaagent-0.16.1.jar=7070:/path/to/cassandra.yml"

# 4. Restart Cassandra node

# 5. Configure Prometheus to scrape metrics from the JMX Exporter's endpoint
- job_name: 'cassandra'
  static_configs:
  - targets: ['cassandra-node-ip:7070']

This setup exposes Cassandra metrics at port 7070, which Prometheus scrapes regularly.

Step 4: Set Alerts Based on Critical Metrics

Monitoring is incomplete without alerting. Define alert thresholds based on your workload and SLA. Some useful alerts include:

High read or write latency exceeding thresholds
Excessive GC pause times affecting node responsiveness
High number of dropped messages indicating possible overload
Nodes becoming unreachable or down
High compaction backlog indicating storage or performance issues

Configure alerts in Prometheus Alertmanager or your monitoring platform to notify your teams promptly.

Step 5: Monitor Cassandra Logs

Logs provide detailed information on errors and events. Essential logs to watch include:

system.log: Main Cassandra server log containing errors and warnings
debug.log: More verbose logs for troubleshooting

Use centralized log management tools like the EFK stack (Elasticsearch, Fluentd/Fluent Bit, Kibana) (Official site) or Loki (Official site) to collect and search logs efficiently.

Troubleshooting Common Monitoring Issues

JMX port inaccessible: Check firewall and security group rules. Ensure Cassandra nodes have JMX port open.
No metrics showing in Prometheus: Verify JMX exporter config and startup parameters.
High latency without clear cause: Investigate GC pauses and compaction backlog.

Summary Checklist

Enable and access Cassandra JMX metrics.
Use nodetool for quick health checks.
Deploy Prometheus with JMX exporter and Grafana dashboards.
Set meaningful alert rules for critical Cassandra metrics.
Collect and analyze Cassandra logs using centralized tools.
Regularly check compaction, GC, latency, and dropped messages.

For more advanced Cassandra tutorials, check our guide on How to Query Data in Cassandra, which gives practical insights into querying Cassandra efficiently and complements monitoring efforts.

Implementing solid monitoring practices will help you maintain high availability, performance, and stability of your Cassandra clusters in production.