Multi-datacenter Consistency Levels

21 min to complete

In a previous lesson, we learned what a multi-datacenter configuration is and expanded the initial Mutant Monitoring ScyllaDB cluster from one to two datacenters. By having multiple datacenters, Division 3 was able to protect its data from a complete physical site failure. Now we must begin to learn more about how to work with our data in such a setup. Let’s start by learning more about the consistency level options available when using ScyllaDB with multiple datacenters.

The Consistency Level (CL) determines how many replicas in a cluster must acknowledge read or write operations before it is considered successful. ONE is the default Consistency Level that cqlsh uses. This means that only one replica node needs to be available to honor read or write requests. Depending on the use case, you may want to use a stronger consistency level to ensure data integrity, such as Quorum in a single datacenter. With Quorum, the majority of replica nodes must respond to honor read or write requests. In a multi-datacenter setup of ScyllaDB, there are additional consistency levels available.

The typical consistency levels to use when working with multiple datacenters are LOCAL_QUORUM and EACH_QUORUM. Consistency levels can be specified when using the cqlsh utility or with the chosen programming language driver used to interact with a ScyllaDB cluster. When LOCAL_QUORUM is used, a quorum of replicas in the local datacenter responds to read or write requests.

In many cases, the application is located in the same location (a region in AWS terms) as one of the ScyllaDB Data Centers (DC). In such a case, LOCAL_QUORUM provides low latency while keeping a higher consistency level.

For EACH_QUORUM, a quorum of replicas in ALL of the datacenters must be available to respond to write requests. Let’s explore this more by spinning up our Mutant Monitoring System.

Setting up the ScyllaDB Cluster and Environment

For this lesson, we require a Multi DC cluster with some tables and data. If you just completed the previous lesson, you can skip to the next paragraph, “Exploring Consistency Levels”. Otherwise:

Follow this procedure to remove previous clusters and set up a new ScyllaDB cluster.

The second datacenter will be referred to as DC2 throughout this lesson.

Change to the mms directory (if it’s not your working dir already):

cd scylla-code-samples/mms

To bring up the second datacenter, run the docker-compose utility and reference the docker-compose-dc2.yml file:

docker-compose -f docker-compose-dc2.yml up -d

After about 60 seconds, you should be able to see DC1 and DC2 when running the “nodetool status” command:

docker exec -it scylla-node1 nodetool status

Once the clusters are up, we’ll create the keyspaces and populate them with data.

The first task is to create the keyspace for the catalog.

docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE catalog WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3, 'DC2':3};

Now that the keyspace is created, it is time to create the table.

use catalog;

CREATE TABLE mutant_data (
   first_name text,
   last_name text,
   address text,
   picture_location text,
   PRIMARY KEY((first_name, last_name)));

Now let’s add a few mutants to the catalog with the following statements:

insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Loblaw','1313 Mockingbird Lane', 'http://www.facebook.com/bobloblaw');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Zemuda','1202 Coffman Lane', 'http://www.facebook.com/bzemuda');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Jim','Jeffries','1211 Hollywood Lane', 'http://www.facebook.com/jeffries');

Next, create the Tracking keyspace:

CREATE KEYSPACE tracking WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3, 'DC2':3};

Now we can create the tracking table:

use tracking;
CREATE TABLE tracking_data (
       first_name text,
       last_name text,
       timestamp timestamp,
       location varchar,
       speed double,
       heat double,
       telepathy_powers int,
       primary key((first_name, last_name), timestamp))
       WITH CLUSTERING ORDER BY (timestamp DESC)
       AND default_time_to_live = 72000 
       AND gc_grace_seconds = 0
       AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
                                  'compaction_window_unit' : 'HOURS',
                                  'compaction_window_size' : 1}; 

Now that the table is created, let’s insert data into the tracking_data table:

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 08:05+0000','New York',1.0,3.0,17) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 09:05+0000','New York',2.0,4.0,27) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:05+0000','New York',3.0,5.0,37) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:22+0000','New York',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 11:05+0000','New York',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 12:05+0000','New York',4.0,24.0,57) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 08:05+0000','Cincinatti',2.0,6.0,5) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 09:05+0000','Cincinatti',4.0,1.0,10) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:05+0000','Cincinatti',6.0,1.0,15) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:22+0000','Cincinatti',8.0,3.0,6) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 11:05+0000','Cincinatti',10.0,2.0,3) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 12:05+0000','Cincinatti',12.0,10.0,60) ;

Exploring Consistency Levels

Let’s begin by diving into EACH_QUORUM. This consistency level requires a quorum of replicas in ALL of the datacenters to be available to respond to write requests. To get started, we will use cqlsh by connecting to the running container with the following command:

docker exec -it scylla-node1 cqlsh

Now, we can run cql commands. To get started, let’s use the tracking keyspace:

use tracking;
select * from tracking.tracking_data;

To change the consistency level to EACH_QUORUM, run the following command:

consistency EACH_QUORUM;

In a different Terminal window, let’s take down the second datacenter and then see what happens when trying to access the keyspace:

docker-compose -f docker-compose-dc2.yml pause

This can take around 60 seconds. Check that DC2 is down:

docker exec -it scylla-node1 nodetool status

We can now return to the cqlsh window and query the keyspace:

select * from tracking.tracking_data;

The query failed because EACH_QUORUM is only supported for writes. Let’s see if data can be added to the keyspace:

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex ','Jones','2018-05-11 08:05+0000','Dallas',1.0,300.0,17) ;

We get the “NoHostAvailable:” error. The query failed because EACH_QUORUM is only supported for writes.

As you can see from the output, we were not able to add data because a quorum was not met.

Let’s see what happens when LOCAL_QUORUM is set. When set, a quorum of replicas in the local datacenter responds to read or write requests.

consistency LOCAL_QUORUM;

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex ','Jones','2018-05-11 08:05+0000','Dallas',1.0,300.0,17); 
select * from tracking.tracking_data;

The write operation completed successfully even though the second datacenter was down. When the second datacenter is brought back up, nodetool repair should be run to ensure data consistency across both sites. Here is the procedure (execute from the non-cqlsh window):

docker-compose -f docker-compose-dc2.yml unpause

docker exec -it scylla-node4 nodetool repair

Conclusion

In this post, we explored how LOCAL_QUORUM and EACH_QUORUM work with examples using the ScyllaDB cluster for the Mutant Monitoring System. For ultimate data consistency, EACH_QUORUM will provide the most resilience at the cost of performance. For optimal performance with eventual consistency, LOCAL_QUORUM is recommended. In one of the next lessons, we will explore how to use ScyllaDB Monitoring to monitor the Mutant Monitoring System. Stay safe out there.

fa-angle-up