Materialized Views

12 min to complete


The Mutant Monitoring System had been receiving a lot of data and Division 3 wanted to find better ways to sort and store data so it can be quickly analyzed with applications. Luckily, having found an exciting feature in ScyllaDB called Materialized Views, they provided us with directives to learn how to use it to help our application developers prevent further acts of terror. In this lesson, we will learn what Materialized Views are and how to use them with the Mutant Monitoring System.

This lesson provides a high-level overview of Materialized Views. To go more in-depth check out the Materialized Views, Secondary Indexes, and Filtering lesson.

What are Materialized Views?

Materialized Views automate the tedious and inefficient work that must be done when an application maintains several tables with the same data that’s organized differently. In the past, every time the application wanted to write data, it needed to write to both tables and reads were done directly (and efficiently) from the desired table. However, ensuring any level of consistency between the data in the two or more views required complex and slow application logic.

ScyllaDB’s Materialized Views feature moves this complexity out of the application and into the servers. This approach makes it much easier for applications to have multiple views of their data. The application just declares the additional views, new view tables are created, and on every update to the original table (known as the “base table”), the additional view tables are automatically updated as well.

More information is in this blog post and in this tech talk from ScyllaDB Summit.

To get started with Materialized Views, we will first need to bring up the ScyllaDB Cluster.

Starting the ScyllaDB Cluster

If you completed other lessons in this course and already have a three-node ScyllaDB Cluster up and running you can skip to the “Create Keyspace and Add Data” section.

Otherwise, follow this procedure to remove previous clusters and set up a new cluster.

Create Keyspace and Add Data

We’ll create the tracking keyspace and add some data to it.

docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE tracking WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};
use tracking;
CREATE TABLE tracking_data (
       first_name text,
       last_name text,
       timestamp timestamp,
       location varchar,
       speed double,
       heat double,
       telepathy_powers int,
       primary key((first_name, last_name), timestamp))
       WITH CLUSTERING ORDER BY (timestamp DESC)
       AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
                                  'compaction_window_unit' : 'HOURS',
                                  'compaction_window_size' : 1}; 

Let’s add some data to the table to make sure we get some meaningful results in our queries later on in the lesson:

INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 08:05+0000','New York',1.0,3.0,17) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 09:05+0000','New York',2.0,4.0,27) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:05+0000','New York',3.0,5.0,37) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 10:22+0000','Cincinnati',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 11:05+0000','Cincinnati',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Jim','Jeffries','2017-11-11 12:05+0000','Cincinnati',4.0,24.0,57) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2018-11-11 08:05+0000','New York',1.0,3.0,17) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2018-11-11 09:05+0000','New York',2.0,4.0,27) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2018-11-11 10:05+0000','New York',3.0,5.0,37) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2017-11-11 10:22+0000','Cincinnati',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2017-11-11 11:05+0000','Cincinnati',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Alex','Jones','2017-11-11 12:05+0000','Cincinnati',4.0,24.0,57) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2018-11-11 10:22+0000','Cincinnati',4.0,12.0,47) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-11-10 11:05+0000','Cincinnati',4.0,9.0,87) ;
INSERT INTO tracking.tracking_data ("first_name","last_name","timestamp","location","speed","heat","telepathy_powers") VALUES ('Bob','Loblaw','2017-12-11 12:05+0000','Cincinnati',4.0,24.0,57) ;

Creating the Materialized Views

If you are not yet connected to the cql shell, connect and switch to the tracking keyspace:

docker exec -it scylla-node1 cqlsh
use tracking;

Now it is time to create our first Materialized View. According to our original table schema for tracking, tracking_data, our primary keys are first_name, last_name, and timestamp. When creating a Materialized View, we will need to reference all of the primary keys that were used when creating the tracking_data table:


Let’s assume that Division 3 wants to retrieve only the mutant’s timestamp at a certain location. The following Materialized View will show only the timestamp and location for each mutant without the other data columns:

CREATE MATERIALIZED VIEW get_locations AS
SELECT location FROM tracking.tracking_data
WHERE location IS NOT NULL AND first_name IS NOT NULL AND last_name IS NOT NULL AND timestamp IS NOT NULL
PRIMARY KEY((location), first_name, last_name, timestamp);

Now, we will be able to see the data from the Materialized View with the following command:

select * from get_locations;

Keep in mind that the above results can be different if you already have other data in the table.

To narrow the results down by city, we can add a location to the select query. Let’s see who is in Cincinnati:

select * from get_locations WHERE location='Cincinnati';

Conclusion

In this lesson, we went over Materialized Views and showed a few examples of how to create views from the Mutant Monitoring System data. With Materialized Views, developers will be able to create intensive applications that can store and sort through data faster and more efficiently with less coding because those tasks can be handled by the ScyllaDB cluster rather than in the applications themselves.

Please be safe out there and continue to monitor the mutants!

fa-angle-up