Build Your Own Cassandra Cluster
All you need
- 2N+1 linux machines (virtual?), with N>=1
- a command line
- a sheet of paper
Install the build system
Apache Cassandra needs a Java sdk installed. The latest supported version is Java 11. I read that upcoming support for latest Java version will be soon here, but we stick to Java 11 that is supported by now.
In order to have Apache Cassandra working on Java 11, it needs to be compiled with it, or you will have to run it with Java 8.
In order to install Java, you might want to do it like this, possibly from a root account:
mkdir -p /opt/java
curl -LO https://download.java.net/openjdk/jdk11/ri/openjdk-11+28_linux-x64_bin.tar.gz
tar vxzf openjdk-11+28_linux-x64_bin.tar.gz
mv jdk-11 /opt/java
cat> /opt/shared/java/load-jdk-11.sh
#! /bin/bash
export JAVA_HOME=/opt/java/jdk-11
export PATH=${JAVA_HOME}/bin:${PATH}
^D
If you add to your ~/.bashrc the script like this:
source /opt/shared/java/load-jdk-11.sh
by opening a new shell you should now have the Java command available from the jdk 11. If you haven't, think of a way you're confortable installing Java 11 on your linux machine.
Cassandra is built with Apache ANT, I know it sounds old school, but I presume there are good reasons behind that, possibly that the build can be done without needing a network connection.
On Debian and derivative it should not be harder to install than:
apt install ant -y
and I presume you have something similar in RH and rpm/yum based distribution:
yum install ant
Build Cassandra with Java 11 support enabled
I will keep this short and just share the script that I created for that:
#! /bin/bash -x
git clone https://github.com/apache/cassandra.git
cd cassandra
stabletag="cassandra-4.0.7"
git stash
git checkout ${stabletag}
gitversion=$(git rev-parse --short HEAD)
ant -Duse.jdk11=true
cd -
tar cvfz "cassandra-${gitversion}.tar.gz" cassandra
The bottomline is: clone the git repo, checkout the last stable tagged version that - at this point - seems to be cassandra-4.0.7 and compile it with ant and the switch:
-Duse.jdk11=true
Give the script a try, it should complete in a matter of maybe a minute.
Once it is done, you should end up with these binaries:
$ ls -l cassandra/bin
total 152
-rwxr-xr-x 1 userx userx 10730 Nov 1 21:47 cassandra
-rw-r--r-- 1 userx userx 6093 Nov 1 21:47 cassandra.in.sh
-rwxr-xr-x 1 userx userx 3060 Nov 1 21:47 cqlsh
-rwxr-xr-x 1 userx userx 95397 Nov 1 21:47 cqlsh.py
-rwxr-xr-x 1 userx userx 1894 Nov 1 21:47 debug-cql
-rwxr-xr-x 1 userx userx 3491 Nov 1 21:47 nodetool
-rwxr-xr-x 1 userx userx 1770 Nov 1 21:47 sstableloader
-rwxr-xr-x 1 userx userx 1778 Nov 1 21:47 sstablescrub
-rwxr-xr-x 1 userx userx 1778 Nov 1 21:47 sstableupgrade
-rwxr-xr-x 1 userx userx 1781 Nov 1 21:47 sstableutil
-rwxr-xr-x 1 userx userx 1778 Nov 1 21:47 sstableverify
-rwxr-xr-x 1 userx userx 1175 Nov 1 21:47 stop-server
Installing the nodes
Once Cassandra is compiled, the full directory (or the tar.gz created above) needs to be copied over each and every of the machines intended to be used. I. e.: if you intend to use user userx in */home/userx/ of each of the machines.
I will not spend many words to explain why you would not want it to run as a root user: do not do it.
Take the sheet of paper and note down the IP address of each of the machines that will be part of the cluster. In my example:
192.168.1.2, 192.168.1.3, 192.168.1.4
All the configurations are done in the file:
./cassandra/conf/cassandra.yml
in each and everyone of the nodes.
The configurations needed are the following:
- A name for the service in: cluster_name
- The IP of the machine in: rpc_address and listen_address
- The list of ip:port for all the machines in the cluster in seeds in the example abov it'll look like that:
seeds:"192.168.1.2:7000,192.168.1.3:7000,192.168.1.4:7000"
- Then, since the nodes are members in a cluster, you will need to append at the end of the config file the following:
auto_bootstrap: false
I assume it'll be pretty tedious and error-prone to get this 100% correct the first time on each and every one of your 2n+1 instances that you are going to install, therefore I will provide a script to relieve the pain.
Just copy it along on each of the nodes, together with the *.tar.gz that we built together, and name it: ca-node-install
#! /bin/bash -x
SERVICE_NAME="data-service"
CASSANDRA_TGZ=$CURRDIR/apache-cassandra-4.0.7-bin.tar.gz
RAM_USAGE_MB=2048
CASSANDRA_HOME=~/cassandra
CFG_SEEDS="192.168.1.2:7000,192.168.1.3:7000,192.168.1.4:7000"
CFG_LISTEN_ADDRESS=${NODE_IP}
CFG_RPC_ADDRESS=${NODE_IP}
CFG_ENDPOINT_SNITCH="GossipingPropertyFileSnitch"
CFG_AUTHENTICATOR="PasswordAuthenticator"
#VERY BAD: change to certs
CFG_AUTO_BOOTSTRAP="false"
function install_cassandra()
{
test -d $CASSANDRA_HOME || mkdir -p $CASSANDRA_HOME
tar vxzf $CURRDIR/$CASSANDRA_TGZ -C $CASSANDRA_HOME --strip-components=1
echo "JVM_OPTS=\"$JVM_OPTS -Xms"$RAM_USAGE_MB"M\"" >> $CASSANDRA_HOME/conf/cassandra-env.sh
test -f $CASSANDRA_HOME/conf/cassandra.yaml.orig || (echo "file exists" ; exit 1)
cp $CASSANDRA_HOME/conf/cassandra.yaml $CASSANDRA_HOME/conf/cassandra.yaml.orig
cat $CASSANDRA_HOME/conf/cassandra.yaml.orig | sed -e 's/^\s*#.*//g' | grep -E '\S' > $CASSANDRA_HOME/conf/cassandra.yaml.template
cat $CASSANDRA_HOME/conf/cassandra.yaml.template | sed -e "s/cluster_name:.*/cluster_name:\ \'$SERVICE_NAME\'/g" \
| sed -e "s/seeds:.*/seeds:\ $CFG_SEEDS/g" \
| sed -e "s/listen_address:.*/listen_address:\ $CFG_LISTEN_ADDRESS/g" \
| sed -e "s/rpc_address:.*/rpc_address:\ $CFG_RPC_ADDRESS/g" \
| sed -e "s/authenticator:.*/authenticator:\ $CFG_AUTHENTICATOR/g" \
> $CASSANDRA_HOME/conf/cassandra.yaml
echo "auto_bootstrap: false" >> $CASSANDRA_HOME/conf/cassandra.yaml
}
if [ $# != 1 ]; then
echo "Usage: $0 <node-ip>"
exit 1
fi
NODE_IP="$1"
install_cassandra
Please note: this is only to get you off the ground, and before you start to adding real data to the cluster, you will need to think proper security with TLS, possibly mTLS.
But before starting to fight with the CA and certificates, you have a cluster that you can use straight in you LAN space.
To get the ball running, check again in the shell of each and every machine that will run the cluster, that you have java 11 available.
Once it is a pass, you can issue, from each and every of the machines, the following command:
./cassandra/bin/cassandra
Once the 3 nodes will start to gossip with each other, in order to connect to one of the nodes you can do:
./cassandra/bin/cql 192.168.1.2 -u cassandra -p cassandra
[security] [linux] [cassandra] [java] [certificate] [git]