3 min to complete
So what happens if we have a query that requires data from more than one table?
In the relational database world, we would use a Join statement. We would combine rows from two different tables, using a related field, usually, ID, to get the information.
However, Joins are not supported in Scylla (or Cassandra, for that matter). What do we do then?
One option would be to do a client-side Join, and this is not recommended as it is very slow and requires iterating through all the information.
A better option is to denormalize.
Note: Another option is to use Materialized Views. Materialized Views essentially build a new table with the indexed column as a partition key and a user-chosen subset of columns as values. Instead of manually creating additional tables or “views” with the same data, this feature enables the Scylla server to do it automatically, safely, and efficiently. In other words, the application updates the base table, and the additional materialized-view tables gets updated automatically as needed.
If you remember, at the start of the lesson, we said that whereas in relational databases, data modeling is all about the data entities and the relationship between these entities, in Scylla data modeling, it’s about the queries and the application. We think about that first and base the data model around that.
Denormalization means creating a new table with duplicate data to answer specific queries, which we would perform with Join in the relational world. You can think about this, as performing Join on write. When writing the data, we duplicate it. Since writes are fast, we don’t pay a hefty price for this. Then, when we want to perform the read, we have the data right there waiting for us.
Let’s look at an example. Say we have two data entities: pets and vaccinations.
In a relational data model, we might have the following:
Now, in a relational database, if we want to get the vaccinations for a specific pet name, we would perform a Join:
SELECT vaccination FROM pets JOIN vaccinations ON pets.id = vaccinations.pet_id WHERE name = 'Rocky'
That would result in:
In Scylla, to answer the same query (get vaccinations for a specific pet by name), we would denormalize and create an additional table: vaccinations_by_pet.
CREATE TABLE vaccinations_by_pet ( vaccination_id uuid, pet_id uuid, pet_name text, vaccination text, PRIMARY KEY (pet_name, vaccination_id) );
In vaccinations_by_pet, the partition key is the pet_name, and the clustering key is the vaccination_id.
Since the partition key is the pet_name, we can easily query the data and get all the vaccinations for that pet. By creating this additional table, we get predictable, efficient query performance when performing the read.