In this lab, you’ll create a base table and different Materialized Views for that base table. You’ll execute updates to the base table and see what happens to the view. Additionally, you’ll learn how to query the MV.
Note: In addition to the instructions provided here, which allow you to run the lab on a machine with Docker, you can find this lab in the Insruqt learning environment here. This environment provides an interactive virtual machine where you can execute all the commands directly from your browser without the need to configure anything.
Before starting the cluster, make sure the aio-max-nr value is high enough (1048576 or more).
This parameter determines the maximum number of allowable Asynchronous non-blocking I/O (AIO) concurrent requests by the Linux Kernel, and it helps ScyllaDB perform in a heavy I/O workload environment.
Check the value:
cat /proc/sys/fs/aio-max-nr
If it needs to be changed:
echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf
Start by creating a Docker container with ScyllaDB:
docker run --name scylla-mv -d scylladb/scylla:5.2.0 --smp 2 --memory 4G
Open a terminal to the container:
docker exec -ti scylla-mv /bin/bash
Run the CQL Shell. You might have to wait for a few seconds until the node is ready:
cqlsh
Next, create a keyspace “architecture”:
CREATE KEYSPACE architecture
WITH replication =
{'class': 'NetworkTopologyStrategy', 'replication_factor': '1'}
AND durable_writes = true;
Let’s look at a table of buildings: the key is the building’s unique name, and additional columns are each building’s city, the year it was built, and height in meters:
USE architecture;
CREATE TABLE buildings (
name text,
city text,
built_year smallint,
height_meters smallint,
PRIMARY KEY (name)
);
Now let’s insert some data:
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Burj Khalifa', 'Dubai', 2010, 828);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Shanghai World Financial Center', 'Shanghai', 2008, 487);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Taipei 101', 'Taipei', 2004, 449);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Sears Tower', 'Chicago', 1974, 442);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('World Trade Center', 'New York City', 1972, 417);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Empire State Building', 'New York City', 1931, 381);
INSERT INTO buildings (name, city, built_year, height_meters)
VALUES ('Chrysler Building', 'New York City', 1930, 283);
The table now looks like this:
SELECT * FROM buildings;
Notice that in this example that the data (city, built) or just (built) could also have served as a key because it uniquely determines a building in this data set. However, ScyllaDB has no way to guarantee that this remains as more data is added. For example, one might add another building built-in 1930 to the list (we’ll do this below). The only key which is guaranteed to be unique and remain unique as additional data is added is the original table’s key, name in our case.
The above table allows us to query buildings by name.
What happens if we want to find buildings by the city?
From this “base table,” let’s ask ScyllaDB to automatically maintain a second table, which is a Materialized View for finding buildings by the city. The new table will have the city as the partition key. The city cannot be the entire key for each record (a building) because we can have multiple buildings in the same city. Therefore we will have (city, name) as the primary key of the new Materialized View table: the city is the partition key, and the name is a clustering key:
CREATE MATERIALIZED VIEW building_by_city AS
SELECT * FROM buildings
WHERE city IS NOT NULL
PRIMARY KEY(city, name);
As a rule in a Materialized View, all the components of the original primary key of the table MUST also appear in the Materialized View’s key. This is why we added the name into the view’s key in this example.
The “WHERE city IS NOT NULL“ filter in the snippet above ensures that if a building includes a null value for the city, it will not be added to the view table because doing so would be illegal (a key component either partition or clustering key cannot be null). Adding this filter is mandatory. If you don’t, you will not be able to create the Materialized View:
CREATE MATERIALIZED VIEW building_by_city AS
SELECT * FROM buildings PRIMARY KEY(city, name);
As expected, the new Materialized View looks like this:
SELECT * FROM building_by_city;
This view contains all the columns from the base table because of the “SELECT * ” command we used to define the view.
What if we’re only interested in some of the columns from the original table? We can create a second view with only some of the columns:
CREATE MATERIALIZED VIEW building_by_city2 AS
SELECT height_meters FROM buildings
WHERE city IS NOT NULL
PRIMARY KEY(city, name);
We can also look at the second view, which selected only some of the columns in the original table:
SELECT * FROM building_by_city2;
Unsurprisingly, the new view contains the key columns (city and name), but the only additional column selected is height_meters. The base table’s built column is not copied to the new view because it was not selected.
Although internally each Materialized View is a separate table, a user is not allowed to modify a view directly:
DELETE FROM building_by_city WHERE city='Taipei';
To summarize, a MV is a view of a “base table”. Under the hood, it’s created as a separate table that can not be modified directly. In a Materialized View, all the components of the original primary key of the table MUST also appear in the MV’s key.
It’s possible to create a view with only some of the columns from the base table as well as use different sorting orders.