15 min to complete
Previously we learned that Mutants have emerged from the shadows and are wreaking havoc on the earth! Our mission is to help the Government keep the Mutants under control by building a Mutant Monitoring System (MMS). The MMS consists of a database that will contain a Mutant Catalog and Monitoring system. In the previous lesson, we created the Mutant Catalog that includes the name, address, and picture of each mutant. Now we will build the Monitoring system to track the mutants — a time-series database of events that collects unique metrics for each mutant.
Setting up a Cluster
Before we proceed make sure you have a cluster set up as explained here.
Building the Monitoring System
The next step is to create the tracking system keyspace and table that will allow us to keep track of the following mutant metrics:
- Name
- Timestamp
- Location
- Speed
- Velocity
- Heat
- Telepathy powers
First, let’s create the tracking keyspace. If you are not in the cqlsh, connect first:
docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE tracking WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};
For this keyspace, we will use a replication factor of three with the NetworkTopologyStrategy. With a replication factor set to three, there will be a replica of the data on each node.
Now we can create the tracking_data table:
use tracking;
CREATE TABLE tracking_data (
first_name text,
last_name text,
timestamp timestamp,
location varchar,
speed double,
heat double,
telepathy_powers int,
primary key((first_name, last_name), timestamp))
WITH CLUSTERING ORDER BY (timestamp DESC)
AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
'base_time_seconds': 3600,
'max_sstable_age_days': 1};
The primary key in ScyllaDB usually consists of two parts: Partition key and Clustering columns. The intent of a partition key is to identify the node that stores a particular row. The clustering key is the second part of the primary key, and its purpose is to store the data in sorted order. For this table, we will use the timestamp column so we can search for data on a day-to-day basis for the mutants.
A composite partition key is simply a way to use two or more column as a partition key. In the table created above, our composite partition key is first_name, last_name. Since first_name and last_name in the tracking_data table is the partition key, all of the rows in each partition are ordered by the clustering key, timestamp.
ScyllaDB stores data in SSTables on the disk. When compaction operations run, it will merge multiple SSTables into a single new one. It is important to choose the right compaction strategy for the right use case. For the MMS, we chose the TimeWindowCompactionStrategy because it was designed for time series data and uses the time of a data point as the clustering key. In a time-series use case, we see some common features:
- Clustering key and write time are correlated.
- Data is added in time order. Only a few out-of-order writes, typically rearranged by just a few seconds.
- Data is only deleted through TTL (Time To Live) or by deleting an entire partition.
- Data is written at a nearly constant rate.
- A query on a time series is usually a range query on a given partition—the most common query is of the form “values from the last hour/day/week.”
Time window compaction helps in making the TTL (expiration times on data) more efficient. ScyllaDB keeps track of each SSTables most recent expiration time, and when it notices the oldest SSTables most recent expiration time has passed, it can drop the entire sstable without bothering to check the expiration time of the individual cells it contains. This is especially useful when all data is inserted with the same TTL.
The compaction argument “base_time_seconds” is the size of the first time window which tells ScyllaDB how much of the most newly written data should be compacted together. All data older than base_time_seconds value will be grouped together with other data the same age. This will help expensive compaction operations run less frequently since we will be recording large amounts of data on the mutants.
The compaction argument “max_sstable_age_days” stops compacting SSTables that have data older than this number of days. If this value is set low, it will help to reduce the total number of times the same value is rewritten to disk and prevents compaction of the largest SSTables.
Now that the table is created, let’s insert data into the tracking_data table for Jim Jeffries:
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 08:05+0000','New York',1.0,3.0,17) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 09:05+0000','New York',2.0,4.0,27) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:05+0000','New York',3.0,5.0,37) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:22+0000','New York',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 11:05+0000','New York',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 12:05+0000','New York',4.0,24.0,57) ;
Let’s also add data for Bob Loblaw:
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 08:05+0000','Cincinnati',2.0,6.0,5) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 09:05+0000','Cincinnati',4.0,1.0,10) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:05+0000','Cincinnati',6.0,1.0,15) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:22+0000','Cincinnati',8.0,3.0,6) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 11:05+0000','Cincinnati',10.0,2.0,3) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 12:05+0000','Cincinnati',12.0,10.0,60) ;
With data existing in the table, we are now able to query the table:
Query all the data that we inserted:
select * from tracking.tracking_data;
Results in:
Query a mutant’s information between different times on the same day:
SELECT * FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries' and timestamp>='2017-11-11 09:05+0000' and timestamp<='2017-11-11 10:05+0000';
Results in:
Query a mutant’s latest record:
SELECT * FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries' LIMIT 1;
Results in:
To check what was the mutant’s highest speed:
SELECT MAX(speed) FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries';
Summary
Now we have the basic database infrastructure needed with initial data for the Mutant Monitoring System. Our next mission is to prepare ourselves for disaster scenarios by learning how to recover from node failures. Be safe out there and continue to analyze the data for malicious activity.