14 min to complete
At Division 3, our mutant data-centers are experiencing more and more cyber attacks by evil mutants and sometimes we experience downtime and cannot track our IoT sensors. We must now prepare to plan for disaster scenarios so that we know for sure that we can survive an attack. In this lesson, we will go through a node failure scenario and learn about consistency levels.
Environment Setup
If you completed the previous lesson, you can skip to the next paragraph, “Simulating the Attack” as you already have your Mutant Monitoring setup and running with the Mutant Catalog keyspace populated with data.
Otherwise, follow this procedure to set up a ScyllaDB cluster. Once the cluster is up, we’ll create the catalog keyspace and populate it with data.
The first task is to create the keyspace for the catalog.
docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE catalog WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};
Now that the keyspace is created, it is time to create the table.
use catalog;
CREATE TABLE mutant_data (
first_name text,
last_name text,
address text,
picture_location text,
PRIMARY KEY((first_name, last_name)));
Now let’s add a few mutants to the catalog with the following statements:
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Loblaw','1313 Mockingbird Lane', 'http://www.facebook.com/bobloblaw');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Zemuda','1202 Coffman Lane', 'http://www.facebook.com/bzemuda');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Jim','Jeffries','1211 Hollywood Lane', 'http://www.facebook.com/jeffries');
Simulating the Attack
Let’s use the nodetool command to examine the status of the nodes in our cluster. If you are still in the cqlsh, exit:
exit
docker exec -it scylla-node1 nodetool status
We can see that all three nodes are currently up and running because the status is set to UN (Up/Normal). Now let’s forcefully remove node 3 from the cluster:
docker pause scylla-node3
Now use nodetool to recheck the status. After about 30 seconds, you’ll be able to see the node is in DN status:
docker exec -it scylla-node1 nodetool status
We can now see that node 3 is missing from the cluster because it is in a DN (Down/Normal) state. The data is safe for now because we created each keyspace with a Replication Factor of three. With a Replication Factor set to three, there will be a replica of the data on each node.
We should be able to still run queries on ScyllaDB:
docker exec -it scylla-node1 cqlsh
select * from catalog.mutant_data;
All of the Mutant data is still there and accessible even though there are only two replicas remaining out of three (RF=3).
Consistency Levels
The data is still accessible because the Consistency Level is still being met. The Consistency Level (CL) determines how many replicas in a cluster that must acknowledge read or write operations. Quorum is the default Consistency Level. When a majority of the replicas respond, the read or write request is honored. Since we are using a Replication Factor of 3, only two replicas respond. QUORUM can be calculated using the formula (n/2 +1) where n is the Replication Factor.
Let’s test how consistency levels work with the cqlsh client. In this exercise, we will try to write data to the cluster using a Consistency Level of one, which means If one replica responds, the read or write request is honored.
CONSISTENCY ONE;
insert into catalog.mutant_data ("first_name","last_name","address","picture_location") VALUES ('Steve','Jobs','1 Apple Road', 'http://www.facebook.com/jobs') ;
select * from catalog.mutant_data;
The query was successful!
Now let’s test using a Consistency Level of ALL. This means that all of the nodes must respond to the read or write request otherwise the request will fail.
CONSISTENCY ALL;
insert into catalog.mutant_data ("first_name","last_name","address","picture_location") VALUES ('Steve','Wozniak','2 Apple Road', 'http://www.facebook.com/woz') ;
select * from catalog.mutant_data;
Both queries failed with error “NoHostAvailable” because we only have two of the three nodes online and the Replication Factor (RF) is 3.
Conclusion
We are now more knowledgeable about recovering from node failure scenarios and should be able to recover from a real attack from the mutants. With a Consistency Level of quorum in a three-node ScyllaDB cluster, we can afford to lose one of the three nodes while still being able to access our data. Please be safe out there as we continue to track the mutants and evolve our Monitoring System.
Here you can find more information about replacing a dead node in a cluster.