How to Insert Data in Cassandra: A Step-by-Step Guide

Apache Cassandra is a powerful NoSQL database designed for handling large volumes of data across many commodity servers without any single point of failure. If you’re new to Cassandra or looking to master data insertion techniques in it, this tutorial will guide you through the essentials. We will cover the prerequisites, how to structure your keyspaces and tables, the commands for inserting data, and troubleshooting common issues so you can succeed in your Cassandra projects.

Prerequisites

Apache Cassandra installed and running: You should have Cassandra installed on your system or have access to a Cassandra cluster. If you need help installing Cassandra, check out our step-by-step installation guide.
CQL (Cassandra Query Language) familiarity: Basic knowledge of CQL syntax and commands will help you follow this tutorial.
CQL shell (cqlsh) access: You need access to cqlsh, the command-line interface for Cassandra, to execute CQL commands.

Step 1: Create a Keyspace

A keyspace is the top-level namespace that defines how data is replicated on nodes within the cluster. To create one, use the CREATE KEYSPACE statement. Here’s an example creating a keyspace named my_keyspace with SimpleStrategy and a replication factor of 1, ideal for development or single-node clusters:

CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

Step 2: Create a Table

Within a keyspace, you create tables that store your data. Here is an example of a table called users with a primary key and several columns:

USE my_keyspace;

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  first_name text,
  last_name text,
  email text
);

Step 3: Insert Data

To insert data, you use the INSERT INTO command. Here’s how to insert a single user into the users table:

INSERT INTO users (user_id, first_name, last_name, email) 
VALUES (uuid(), 'John', 'Doe', '[email protected]');

The uuid() function generates a unique identifier for the user_id column.

Multiple Row Insertion

Cassandra does not support multi-row insertion using a single INSERT statement like some SQL databases. Instead, you must execute separate insert statements for each row.

Step 4: Verify Inserted Data

To verify data insertion, use the SELECT statement:

SELECT * FROM users;

This command retrieves all rows from the users table.

Troubleshooting Common Issues

Invalid query errors: Double-check your table and keyspace names and ensure columns exist.
Also verify data types for inserted values match column definitions.
UUID generation problems: Use the uuid() or now() functions in CQL to generate valid UUIDs.
Data not appearing after insert: Make sure you are querying the correct keyspace and table, and check that you committed your changes if using batch operations.
Write timeout errors: These can occur if the cluster is under heavy load or misconfigured replication strategy; verify your setup.

Summary Checklist

Ensure Cassandra cluster or local installation is active.
Create an appropriate keyspace with proper replication.
Define tables with the right primary keys and columns.
Use INSERT INTO statements to add data with valid UUIDs for keys.
Verify data insertion with SELECT queries.
Troubleshoot common errors around syntax, UUIDs, and write timeouts.

For more in-depth understanding of keyspaces and Cassandra setup, check out our guide on How to Create Keyspaces in Cassandra which complements this tutorial.

With these fundamentals, you are equipped to confidently insert data into Cassandra and manage your NoSQL database effectively. Experiment by creating your own schemas and inserting diverse datasets to explore Cassandra’s scalability and performance.