SS24 – High Availability Lab

In this lab, we’ll bring up a three-node cluster and demonstrate how the WRITE and READ look, in a cluster where our Replication Factor (RF) is set to 3. We’ll simulate a situation where some nodes are down and see how changing the Consistency Level affects the read and write operations and failure conditions in a ScyllaDB cluster.

Please ensure that your environment meets the following prerequisites:

  1. Docker for Linux, Mac, or Windows. Please note that running ScyllaDB in Docker is only recommended to evaluate and try ScyllaDB. For best performance, a regular OS install is recommended.
  2. 3GB of RAM or greater for Docker.
  3. If you are using Linux, you will need docker-compose.

Note: In addition to the instructions provided here, which allow you to run the lab on a machine with Docker, you can find this lab in the Instruqt learning environment here. This environment provides an interactive virtual machine where you can execute all the commands directly from your browser without the need to configure anything.

Before starting the cluster, make sure the aio-max-nr value is high enough (1048576 or more). 

This parameter determines the maximum number of allowable Asynchronous non-blocking I/O (AIO) concurrent requests by the Linux Kernel, and it helps ScyllaDB perform in a heavy I/O workload environment.

Check the value: 

cat /proc/sys/fs/aio-max-nr

If it needs to be changed:

echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf

First, we’ll bring up a 3-node ScyllaDB Docker cluster.
Set up a Docker Container with one node, called Node_X:

 docker run --name Node_X -d scylladb/scylla:5.2.0 --overprovisioned 1 --smp 1

Create two more nodes, Node_Y and Node_Z, and add them to the cluster of Node_X. The command “$(docker inspect –format='{{ .NetworkSettings.IPAddress }}’ Node_X)” translates to the IP address of Node-X:

docker run --name Node_Y -d scylladb/scylla:5.2.0 --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' Node_X)" --overprovisioned 1 --smp 1

docker run --name Node_Z -d scylladb/scylla:5.2.0 --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' Node_X)" --overprovisioned 1 --smp 1

Wait a minute or so and check the node status: 

docker exec -it Node_Z nodetool status  

You’ll see that eventually, all the nodes have UN for status. U means up, and N means normal. Read more about Nodetool Status Here.

Once the nodes are up, and the cluster is set, we can use the CQL shell to create a table.

Run a CQL shell: 

docker exec -it Node_Z cqlsh 

Create a keyspace called “mykeyspace”, with a Replication Factor of three: 

CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3};

Next, create a table with three columns: user id, first name, and last name, and insert some data: 

use mykeyspace; 
CREATE TABLE users ( user_id int, fname text, lname text, PRIMARY KEY((user_id))); 

Insert into the newly created table two rows: 

insert into users(user_id, fname, lname) values (1, 'rick', 'sanchez'); 
insert into users(user_id, fname, lname) values (4, 'rust', 'cohle'); 

Read the table contents:

select * from users;

To summarize, so far, we saw how to create a three-node cluster with RF=3, how to open a CQL Shell, create a table, insert data into it, and read the data. In the next topic, we will see what happens when nodes are down and how the Consistency Level impacts the read/write operations.

Read and write at different consistency levels

In the previous topic, we saw how to create a three-node cluster. We opened a CQL Shell to a node in the cluster, created a table with RF=3, wrote data to it, and read that data. 

Now we will see what happens when we try to read and write data to our table, with varying Consistency Levels and when some of the nodes are down.

Let’s make sure all our nodes are still up:

docker exec -it Node_Z nodetool status

Now, set the Consistency Level to QUORUM and perform a write:

docker exec -it Node_Z cqlsh
use mykeyspace; 
CONSISTENCY QUORUM 
insert into users (user_id, fname, lname) values (7, 'eric', 'cartman');

Read the data to see if the insert was successful:

select * from users; 

The read and write operations were successful. What do you think would happen if we did the same thing with Consistency Level ALL? 

CONSISTENCY ALL 
insert into users (user_id, fname, lname) values (8, 'lorne', 'malvo'); 
select * from users;

The operations were successful again. CL=ALL means that we have to read/write to the number of nodes according to the Replication Factor, 3 in our case. Since all nodes are up, this works. 
Next, we’ll take one node down and check read and write operations with a Consistency Level of Quorum and ALL.

Take down Node_Y and check the status (it might take some time until the node is actually down): 

exit
docker stop Node_Y 
docker exec -it Node_Z nodetool status 

Now, set the Consistency Level to QUORUM and perform a write: 

docker exec -it Node_Z cqlsh 
CONSISTENCY QUORUM 
use mykeyspace;
insert into users (user_id, fname, lname) values (9, 'avon', 'barksdale');  

Read the data to see if the insert was successful: 

select * from users; 

With CL = QUORUM, the read and write were successful. What will happen with CL = ALL? 

CONSISTENCY ALL 
insert into users (user_id, fname, lname) values (10, 'vm', 'varga');  
select * from users; 

Both read and write fail. CL = ALL requires that we read/write to three nodes (based on RF = 3), but only two nodes are up.

What happens if another node is down?

Take down Node_Z and check the status:

exit
docker stop Node_Z 
docker exec -it Node_X nodetool status 

Now, set the Consistency Level to QUORUM and perform a read and a write: 

docker exec -it Node_X cqlsh 
CONSISTENCY QUORUM
use mykeyspace; 
insert into users (user_id, fname, lname) values (11, 'morty', 'smith');  
select * from users; 

With CL = QUORUM, the read and write fail. Since RF=3, QUORUM means at least two nodes need to be up. In our case, just one is. What will happen with CL = ONE? 

CONSISTENCY ONE 
insert into users (user_id, fname, lname) values (12, 'marlo', 'stanfield');   
select * from users; 

This time the read and write are successful. CL = ONE requires that we read/write to just one node. Since one node is up, this works. 

To summarize, we saw what happens when some of the nodes are down in our three node cluster and RF=3, and with different Consistency Levels. CL = ALL will provide higher consistency, while CL = 1 will provide higher availability. In the next lesson will take a closer look at the ScyllaDB architecture. 

fa-angle-up