Backup and Restore

In a previous lesson, we set up the Scylla Monitoring stack for the Mutant Monitoring System so we can monitor the performance, latency, and status of nodes in the Scylla Cluster. Due to recent violent events in the mutant community, Division 3 implemented a new policy for Scylla Administrators to learn how to backup and restore the mutant data in the cluster. In this lesson, we will learn how to backup and restore data from the Mutant Monitoring System.

Backup and restore should only be used in extreme cases, such as data corruption or when an entire cluster is wiped out. In other cases, Scylla’s high availability and data replication allow us to recover data by other means, for example, by using repair.

Setting up the Scylla Cluster

We will use a single DC cluster, which we will build from scratch. Follow this procedure to remove previous clusters and set up a new Scylla cluster.

Next, we create the mutant catalog:

docker exec -it mms_scylla-node1_1 cqlsh

Start with the keyspace:

CREATE KEYSPACE catalog WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};
use catalog;

Create the table:

CREATE TABLE mutant_data (
first_name text,
last_name text,
address text,
picture_location text,
PRIMARY KEY((first_name, last_name)));

And add some data:

insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Loblaw','1313 Mockingbird Lane', 'http://www.facebook.com/bobloblaw');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Bob','Zemuda','1202 Coffman Lane', 'http://www.facebook.com/bzemuda');
insert into mutant_data ("first_name","last_name","address","picture_location") VALUES ('Jim','Jeffries','1211 Hollywood Lane', 'http://www.facebook.com/jeffries');

Backing Up the Data

We will create a full backup. Keep in mind that it’s also possible to do an incremental backup.

First, we need to backup our schema. We will do this for each of the three nodes in the cluster.

exit
docker exec -it mms_scylla-node1_1 bash

cqlsh -e "DESC SCHEMA" > schema.cql

Repeat the above two commands for the other two nodes.

With the schema backed up in all the three nodes, we can create a snapshot of the catalog keyspace used for the Mutant Monitoring System.

The snapshots will be created in each node.  Snapshots are taken using nodetool snapshot. The command first flushes the MemTables from memory to SSTables on disk, then creates a hard link for each SSTable in each keyspace. With time, SSTables are compacted, but the hard link keeps a copy of each file. This takes up and an increasing amount of disk space. Do the following for each of the three nodes:

nodetool snapshot catalog

The snapshot is created in the Scylla data directory /var/lib/scylla/data and It will have the following structure: keyspace_name/table_name-UUID/snapshots/snapshot_name.

Once we have the backup ready on each node, we can move forward and simulate data loss. Keep in mind that in a real-world scenario, the backups should be stored at a remote location and not on the node itself.

Simulating Data Loss

All of the data is backed up on each node. Division 3 must now prepare to handle cyber attacks from the mutants and other external organizations. In this scenario, we will simulate such an attack by using cqlsh to delete the Mutant Monitoring Keyspace. While connected to one of the nodes:

cqlsh
drop keyspace catalog;

To verify that the keyspace is gone, run the following command:

describe keyspaces;

Great, we only have the default keyspaces now. We can now learn how to restore the data.

Restoring the Data

To restore the data, we first need to re-create the schema keyspace from the backup we previously created. This only has to be done for one node. When connected to the node, exit the cqlsh:

exit

Recreate the schema:

cqlsh -e "SOURCE '/schema.cql'"

Now we can restore the actual data. We do the following for each one of the three nodes:

Run the nodetool drain command to ensure the data is flushed to the SSTables:

nodetool drain

Next, we need to shut down the node:

supervisorctl stop scylla

Delete all the files in the commitlog. Deleting the commitlog will prevent the newer insert from overriding the restored data:

rm -rf /var/lib/scylla/commitlog/*

We can now restore the catalog keyspace. We first need to find the original snapshot folder which is the oldest directory in /var/lib/scylla/data/catalog

cd /var/lib/scylla/data/catalog/
ls -al

The original data directory with the snapshot is mutant_data-cea71d208cf711e98063000000000000. You can see that it is the older one. The current data directory created after importing the schema is mutant_data-1bf558c08cf911e98063000000000000.

Copy the contents from the snapshot directory to the new data directory:

cp -rf mutant_data-cea71d208cf711e98063000000000000/snapshots/1560333440865/* mutant_data-1bf558c08cf911e98063000000000000/

When this is complete, we can start Scylla with the following command:

supervisorctl start scylla

Repeat for mms_scylla-node2_1 and mms_scylla-node3_1. It will take a few minutes for the nodes to form a cluster.

Make sure the cluster is up:

exit
docker exec -it mms_scylla-node1_1 nodetool status

After the nodes are online, we can run a repair and then verify that our data is restored properly with the following commands:

docker exec -it mms_scylla-node1_1 nodetool repair

This means that repair is syncing the data across the cluster. The entire process may take a few minutes. When the repair process is complete, we can run queries to ensure that the data is restored to the keyspace:

docker exec -it mms_scylla-node1_1 cqlsh -e 'select * from catalog.mutant_data;'

Conclusion

In conclusion, we followed the directive from Division 3 to teach each Scylla Administrator to learn how to backup and restore a cluster. The process began with backing up the schema, followed by creating a snapshot of the data. We simulated an attack by deleting the keyspace and then restoring and repairing the cluster. Stay safe out there and backup as much as you can!

To report this post you need to login first.