Stacey Schneider is focused on helping evangelize how cloud technologies are transforming application development and delivery for VMware. Prior to its acquisition, Stacey led marketing and community management for application management software provider Hyperic, now a part of VMware’s management portfolio. Before her work in the cloud, she also held various technical leadership positions at CRM software pioneer Siebel Systems, including Director of Technology Product Marketing, managing the Technology Competency in Europe, and the Globalization professional services practice. She was also a part of Siebel’s Nexus project, which focused on building portable web applications that could be deployed across java application servers as well as .NET. Stacey is also the managing principal of SiliconSpark, a consulting agency that has helped over 12 software companies go to market on the web and across the cloud over the past 4 years. Stacey has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

Deploying a Cassandra Cluster with vFabric's Application Director

10.16.2012
| 1541 views |
  • submit to reddit

While vFabric Application Director supports a variety of products out of the box (mostlyvFabric) and a growing number of products on the Cloud Application Management MarketplaceBETA (like Puppet Integration), it is easy to extend Application Director to support additional applications. Let’s take a look at how to use Application Director with Apache’s open source database, Cassandra. If you are new to Application Director, you might check out this 5-minute explanation. Otherwise, this post will show you how to automate the provisioning and set-up of a Cassandra cluster with Application Director in two main steps: 1) creating the catalog service and 2) defining a blueprint. Then, we will look at an example.

There are a couple of challenges around setting up a Cassandra cluster:

  • You need to orchestrate several virtual machines and servers.
  • On each machine, you need to install the Cassandra services (at least a JRE and the Cassandra distribution).
  • You have a couple of tasks and configurations to set up concerning the IPs of the servers within the cluster.
  • You want to avoid manual tasks because they are error-prone and time consuming.

Then and of course, you will have the fine-tuning challenges but they are out of scope of this document. Of course, the provided recipes could be enhanced by accepting more properties etc.

Install Cassandra into the Catalog of Services

Application Director provides a catalog of services. The catalog of services lists the “middleware” that can be deployed onto a virtual machine. A service has a name, version, description, a list of supported OSes, properties, and other related information (see the documentation for more). These services also have defined actions to support the service lifecycle (i.e. Install, Configure, Start) where scripts will be executed and do what you expect them to do. Below is a screen shot of the Cassandra service’s actions and associated scripts that I am defining in Application Director.

In our example, Cassandra will be deployed into the catalog of services by using the tar.gz distribution.

Before dealing with the install scripts, we need to understand the Cassandra configuration (within the ./conf/cassandra.yaml file) and the following properties are particularly important when setting up Cassandra in Application Director:

Initial_tokenIn a Cassandra cluster, the Key/Value pairs are spread among all the nodes of the cluster. The entries are hashed (based on their key) using a consistent hashing algorithm that enables the entry to be associated to a specific node. For this and internally, there is a convention but to keep it simple, each node in the Cassandra cluster is assigned a token (called the initial_token) that gives you the entries that the node is responsible for.Each node has a defined token. When setting up a new cluster, the token is calculated (using a python script) and specifies the key range each node (ie. JVM) is responsible for.-        This is a range between 0 and 2^127-        Each node is responsible for a subpart of this integer space. Hence, each node is responsible for a range of keys.tApplication Director provides you a set of variables to use within scripts. Two of them will help us to calculate the token:-        nodes_ips. This is the list of the IPs of the virtual machines in the cluster. For instance, if you want to deploy a 5 node cluster, Application Director will fill in this list with 5 IPs.-        node_index. This is the index in the previous list of the current virtual machine. For instance, the index could be 3. So, the current VM is the third VM in the cluster.
SeedsThis property must be a comma-separated list of the cluster IPs.Based on the Application Director  variable nodes_ips, this is property will be generated and specify into the configuration file cassandra.yaml
Listen-address & rpc_addressMust be set to the current IP or a specific network interface depending on your deploymentApplication Director provide a property self:ip we can use into the scripts

In theApplication Director interface shown below, you will define and see all the above properties for the Cassandra Service. These properties will be used in the scripts. We added a repo_url property which is the location to download the Cassandra distribution.

 

As mentioned, the service has 3 actions or steps in it’s lifecycle: Install, Configure, and Start. Here are the scripts in more detail.

INSTALL Script

This script will download the Cassandra distribution from defined repo_url and unpack the distribution.

#!/bin/bash
cd /opt

#get the package from nexus repository (specified $repo_url variable)
wget $repo_url/apache-cassandra-1.1.5-bin.tar.gz

tar –xvf ./apache-cassandra-1.1.5-bin.tar.gz

rm –f ./apache-cassandra-1.1.5-bin.tar.gz

CONFIGURE Script

This script will modify the cassandra.yaml according to the number of servers you asked for and the current IPs.

This script will also calculate the initial_token for the current node.

#!/bin/bash

cd /opt/apache-cassandra-1.1.5

cd bin

# Cassandra required a hash token per node that is a function of node_index and count nodes expected # in cluster. This is one way of doing the compute of the token

echo “# !/usr/bin/python” > appdirtoken.py

echo “import sys” >> appdirtoken.py

echo “num=int(sys.argv[1])” >> appdirtoken.py

echo “i=int(sys.argv[2])” >> appdirtoken.py

echo ‘print “%d” % (i*(2**127)/num)’ >> appdirtoken.py

chmod u+x appdirtoken.py

cd..

TOKEN=`./bin/appdirtoken.py ${#nodes_ips[@]} $node_index`

#Turns arrays of IPs of clustered VM into a list

SUBSTR=”${nodes_ips[0]}”

for((i = 1; I < ${#nodes_ips[@]}; i++ )); do

SUBSTR=”$SUBSTR,${nodes_ips[$i]}”

done

echo $TOKEN

# Update Cassandra config details with minimum parameters provided at deploy time (eg. Node count) or from the IaaS (eg IPs)

sed –ie s/initial_token:.*/initial_token:\ $TOKEN/g ./conf/cassandra.yaml

echo $SUBSTR

sed –ie s/\-\ seeds:.*/\-\ seeds:\ \”$SUBSTR\”/g ./conf/cassandra.yaml

sed –ie s/listen_address:.*/listen_address:\ \”$ip\”/g ./conf/cassandra.yaml

sed –ie s/rpc_address:.*/rpc_address:\ \”$ip\”/g ./conf/cassandra.yaml

cat ./conf/cassandra.yaml

START Script

In our sample, Cassandra is not installed as a service. So, this script will start the node. At the end, the ring status is displayed using the Cassandra nodetool CLI.

#!/bin/bash

echo $PATH

#depending on the installed and used JDK (defined via the previous service)
JAVA_HOME=$JAVA_HOME

/opt/apache-cassandra-1.1.5/bin/cassandra 2>&1

sleep 10

tail -100 /var/log/cassandra/system.log

#display cluster details.

/opt/apache-cassandra-1.1.5/bin/nodetool –h $ip ring

Define the Blueprint

In the previous section, we added Cassandra into the Application Director catalog of services.  We now need to define the OS, the JRE, the Cassandra service, and the orchestration between all the virtual machines. In Application Director, we drag and drop these elements on a canvas to define the blueprint (i.e. the entire logical application topology) that will provision and deploy our Cassandra cluster.

All the components are wired together by the blueprint, which can be shared and reused across the teams. Each team is able to override a certain properties (i.e. the ones you let them override) before deploying. For instance, they will be able to define the number of nodes they want in the cluster.

Here is the blueprint defined in Application Director: 

Sample with a 3-node Cluster

Here is a sample with a 3node cluster. During the deployment, Application Director provides a view (see below) for you to follow the progress of the different steps. The gear icon for each task allows you to look at the log files, view the script, or see the properties.  For instance, if you click the “View Logs” on START phase, you will see the Cassandra ring details (as specified in the scripts).

 

>> For more information on vFabric Application Director, check out the product features, find more resources, or download a trial.

About the Author: Olivier Mallassi is part of the vFabric team. In his career, he has worked as developper, Consultant and Architect for a variety of companies. His specialty and expertise is around Java and more around NoSQL, NewSQL and Big Data. Olivier is also deeply involved around DevOps and Continuous Delivery.

0
Published at DZone with permission of its author, Stacey Schneider.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)