Apache Cassendra Installation and Configuration Guide

Installing Cassandra Locally

This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Step 0: Prerequisites and Connecting to the Community

Cassandra requires the most stable version of Java 7 or 8 you can deploy, preferably the Oracle/Sun JVM. Cassandra also runs on OpenJDK and the IBM JVM. (It will NOT run on JRockit, which is only compatible with Java 6.)

The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list (subscription required) and participate in the #cassandra channel on IRC.

Step 1: Download Cassandra

Download links for the latest stable release can always be found on the website.
Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details.
Users of RPM-based distributions can get packages from Datastax.
If you are interested in building Cassandra from source, please refer to How to Build page.
For more details about misc builds, please refer to Cassandra versions and builds page.

Step 2: Basic Configuration

The Cassandra configuration files can be found in the conf directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in /etc/cassandra.

Step 2.1: Directories Used by Cassandra

If you’ve installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from conf/cassandra.yaml: data_file_directories (/var/lib/cassandra/data), commitlog_directory (/var/lib/cassandra/commitlog), and saved_caches_directory (/var/lib/cassandra/saved_caches). Make sure these directories exist and can be written to.

By default, Cassandra will write its logs in /var/log/cassandra/. Make sure this directory exists and is writeable, or change this line in conf/log4j-server.properies:

Note that in Cassandra 2.1+, the logger in use is logback, so change this logging directory in your conf/logback.xml file such as:

JVM-level settings such as heap size can be set in conf/cassandra-env.sh.

Step 3: Start Cassandra

And now for the moment of truth, start up Cassandra by invoking ‘bin/cassandra -f’ from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don’t see messages with scary words like “error”, or “fatal”, or anything that looks like a Java stack trace, then everything should be working.

Press “Control-C” to stop Cassandra.

If you start up Cassandra without the “-f” option, it will run in the background. You can stop the process by killing it, using ‘pkill -f CassandraDaemon’, for example.

Cassandra Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking bin/cassandra -f. Since Cassandra 2.1, the tar.gz download has shipped with the log and data directories defaulting to the Cassandra directory. Versions prior defaulted to /var/log/cassandra and /var/lib/cassandra/. Due to this it is necessary to either start Cassandra with root privileges or change the conf/cassandra.yaml to use a directory owned by the current user. Snow Leopard ships with Java 1.6.0 and does not require changing the JAVA_HOME environment variable or adding any directory to your PATH. On Linux just make sure you have a working Java JDK package installed such as the openjdk-6-jdk on Ubuntu Lucid Lynx.

Step 4: Using cqlsh

bin/cqlsh is an interactive command line interface for Cassandra. cqlsh allows you to execute CQL (Cassandra Query Language) statements against Cassandra. Using CQL, you can define a schema, insert data, execute queries. Run the following command to connect to your local Cassandra instance with cqlsh:

$ bin/cqlsh
You should see the following prompt, if successful:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
For clarity, we will omit the cqlsh prompt in the following examples.

You can access the online help with ‘help;’ command. Commands are terminated with a semicolon (‘;’) in cqlsh.

First, create a keyspace — a namespace of tables.

WITH REPLICATION = { ‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 1 };
Second, authenticate to the new keyspace:

USE mykeyspace;
Third, create a users table:

user_id int PRIMARY KEY,
fname text,
lname text
Now you can store data into users:

INSERT INTO users (user_id, fname, lname)
VALUES (1745, ‘john’, ‘smith’);
INSERT INTO users (user_id, fname, lname)
VALUES (1744, ‘john’, ‘doe’);
INSERT INTO users (user_id, fname, lname)
VALUES (1746, ‘john’, ‘smith’);
Now let’s fetch the data you inserted:

SELECT * FROM users;
You should see output reflecting your new rows:

user_id | fname | lname
1745 | john | smith
1744 | john | doe
1746 | john | smith
You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows:

CREATE INDEX ON users (lname);

SELECT * FROM users WHERE lname = ‘smith’;

user_id | fname | lname
1745 | john | smith
1746 | john | smith
Write your Application

To connect to Cassandra, you’ll need a database driver for your language of choice. DataStax sponsors development of CQL drivers at https://github.com/datastax. A full list of CQL drivers can be found on the ClientOptions page.

When deciding how to design your schema and layout your data, it will be helpful to review the resources on how to DataModel.

You may also want to read the full CQL documentation.

Configuring Multinode Clusters

Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster.

Setting up a Cassandra cluster is almost as simple as repeating the above procedures for each node in your cluster. There are a few minor exceptions though.

Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It’s customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.

In addition to seeds, you’ll also need to configure the IP interface to listen on for Gossip and CQL, (listen_address and rpc_address respectively). Use a ‘listen_address that will be reachable from the listen_address used on all other nodes, and a rpc_address` that will be accessible to clients.

Once everything is configured and the nodes are running, use the bin/nodetool status utility to verify a properly connected cluster. For example:

$ bin/nodetool -host -p 7199 status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
— Address Load Tokens Owns Host ID Rack
UN 30.99 KB 256 32.4% 92b20e08-9ddd-4f55-9173-8516e74d27f5 rack1
UN 31 KB 256 31.5% b9616658-c744-48fb-b64f-83f96b007d93 rack1
UN 30.96 KB 256 36.1% f7a08973-85bd-460f-8176-d6f9df8c23f4 rack1
Advanced cluster management is described in Operations.

If you don’t yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with ccm (Cassandra Cluster Manager).


About ashokabhat

I am a C,C ,JAVA,Adobe Flex,.NET Programmer Currently working as a Software Developer
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s