Building the Tracking System

15 min to complete

Previously, we learned that Mutants have emerged from the shadows and are wreaking havoc on the earth! Our mission is to help the Government keep the Mutants under control by building a Mutant Monitoring System (MMS). The MMS consists of a database that will contain a Mutant Catalog and Monitoring system. In the previous lesson, we created the Mutant Catalog that includes the name, address, and picture of each mutant. Now, we will build the Monitoring system to track the mutants — a time-series database of events that collects unique metrics for each mutant.

Setting up a Cluster

Before we proceed, make sure you have a cluster set up as explained here.

Building the Monitoring System

The next step is to create the tracking system keyspace and table that will allow us to keep track of the following mutant metrics:

  • Name
  • Timestamp
  • Location
  • Speed
  • Velocity
  • Heat
  • Telepathy powers

First, let’s create the tracking keyspace. If you are not in the cqlsh, connect first:

docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE tracking WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};

For this keyspace, we will use a replication factor of three with the NetworkTopologyStrategy. With a replication factor set to three, there will be a replica of the data on each node.

Now we can create the tracking_data table:

use tracking;
CREATE TABLE tracking_data (
       first_name text,
       last_name text,
       timestamp timestamp,
       location varchar,
       speed double,
       heat double,
       telepathy_powers int,
       primary key((first_name, last_name), timestamp))
       WITH CLUSTERING ORDER BY (timestamp DESC)
       AND default_time_to_live = 72000 
       AND gc_grace_seconds = 0
       AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
                                  'compaction_window_unit' : 'HOURS',
                                  'compaction_window_size' : 1}; 

The primary key in ScyllaDB usually consists of two parts: A Partition Key and Clustering columns. The intent of a partition key is to identify the node that stores a particular row. The clustering key is the second part of the primary key, and its purpose is to store the data in sorted order. For this table, we will use the timestamp column so we can search for data on a day-to-day basis for the mutants.

A composite partition key is simply a way to use two or more columns as a partition key. In the table created above, our composite partition key is first_name, last_name. Since first_name and last_name in the tracking_data table is the partition key, all of the rows in each partition are ordered by the clustering key, timestamp.

ScyllaDB stores data in SSTables on the disk. When compaction operations run, it will merge multiple SSTables into a single new one. It is important to choose the right compaction strategy for the right use case. For the MMS, we chose the TimeWindowCompactionStrategy because it was designed for time series data and uses the time of a data point as the clustering key. In a time-series use case, we see some common features:

  1. The Clustering Key and write time are correlated.
  2. Data is added in time order. Only a few out-of-order writes, typically rearranged by just a few seconds.
  3. Data is only deleted through Tdocs.scylladb.com/kb/compactionTL (Time To Live) or by deleting an entire partition.
  4. Data is written at a nearly constant rate.
  5. A query on a time series is usually a range query on a given partition—the most common query is of the form “values from the last hour/day/week.”

Time window compaction helps in making the TTL (expiration times on data) more efficient. ScyllaDB keeps track of each SSTables most recent expiration time. When it notices the oldest SSTables most recent expiration time has passed, it can drop the entire SSTable without bothering to check the expiration time of the individual cells it contains. This is especially useful when all data is inserted with the same TTL. In this example a default_time_to_live of 72000 seconds is used (this is in seconds, which translates to 20 hour windows).

In the above example, all SSTables created within one hour will get compacted (using Size Tiered Compaction Strategy), creating one SSTable per hour. The rows will turn into tombstones when they reach the defined TTL (20 hours here). It makes sense to set the gc_grace_seconds to zero when using TWCS (unless you are performing explicit deletions, which is normally discouraged). You can read more about this and about tombstone_gc in this blog post.

Now that the table is created let’s insert data into the tracking_data table for Jim Jeffries:

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 08:05+0000','New York',1.0,3.0,17) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 09:05+0000','New York',2.0,4.0,27) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:05+0000','New York',3.0,5.0,37) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:22+0000','New York',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 11:05+0000','New York',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 12:05+0000','New York',4.0,24.0,57) ;

Let’s also add data for Bob Loblaw:

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 08:05+0000','Cincinnati',2.0,6.0,5) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 09:05+0000','Cincinnati',4.0,1.0,10) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:05+0000','Cincinnati',6.0,1.0,15) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 10:22+0000','Cincinnati',8.0,3.0,6) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 11:05+0000','Cincinnati',10.0,2.0,3) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-11 12:05+0000','Cincinnati',12.0,10.0,60) ;

With data existing in the table, we are now able to query the table:

Query all the data that we inserted:

select * from tracking.tracking_data;

Results in:

Query a mutant’s information between different times on the same day:


SELECT * FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries' and timestamp>='2017-11-11 09:05+0000' and timestamp<='2017-11-11 10:05+0000';

Results in:

Query a mutant’s latest record:


SELECT * FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries' LIMIT 1;

Results in:

To check what was the mutant’s highest speed:

SELECT MAX(speed) FROM tracking.tracking_data WHERE first_name='Jim' and last_name='Jeffries';

Summary

Now, we have the basic database infrastructure needed with initial data for the Mutant Monitoring System. Our next mission is to prepare ourselves for disaster scenarios by learning how to recover from node failures. Be safe out there and continue to analyze the data for malicious activity.

 

fa-angle-up