Materialized Views (MV) are a global index. When a new MV is declared, a new table is created and distributed to the different nodes using the standard table distribution mechanisms. It’s scalable, just like normal tables. It is populated by a query running against the base table. It’s not possible to directly update a MV; it’s updated when the base table is updated.
Each Materialized View is a set of rows that correspond to rows present in the underlying, or base, table specified in the materialized view’s SELECT statement.
Reads from a Materialized View are just as fast as regular reads from a table and just as scalable. But as expected, updates to a table with Materialized Views are slower than regular updates since these updates need to update both the original table and the Materialized View and ensure the consistency of both updates. However, doing those in the application without server help would have been even slower.
Some common use cases for MV are Indexing with denormalization, different sort orders, and filtering (pre-computed queries). This session also goes over examples, the CQL syntax to use, and limitations.
Transcript
Materalized views, secondary index
filtering, filtering with indexes and “what should I use” so going to the
definition a view is a table containing a copy of the results of some
query performed on a base table the main takeaway is a materialized view is a
table it’s not you know something that it’s attached to your table it’s a
different table so these are the use cases so basically you know
denormalization because everybody that used Cassandra
used ScyllaDB sometimes you know you have to denormalize you have to create two
tables that have some overlay, overlapping data so materialized views and
secondary indexes are above all a convenience you know especially for
developers so but again try not to forget that they are convenient but you
should know what you’re doing and that’s the reason we are having this session so
you know what are the shortcomings and or the good use cases for that so
different sorting order and precomputed queries so to apply
restrictions to the base table values so here’s a base table example
so you have a base table the name of the buildings is the partition key the
cities the clustering key and all the others are regular keys you just had Tzach’s
session I’m assuming everybody knows what I’m talking about when we say
partition key clustering key and regular columns right so on the first query that
you see there select from building where building is that name it’s going to work
if you try to query by country which is the third column which is a regular
column it’s going to error out it’s going to tell you if you want to do this
use allow filtering so you have to add the key words allow filtering to the
query but then we are basically telling you that performance will be
unpredictable because it’s not indexed right so sometimes it would be a full
table scan ScyllaDB is going to read all your records and then apply whatever
rule you put there like for example country
but then we can create a materialized view so and by the way we
are going to go into the syntax a little later but that’s the idea there so first
you have a base table and then we created a materialized view called
building by country and we made the country the partition key for that
table for that materialized view so then what you can do instead of querying your
base table for the country you do that on the materialized view so it’s pretty
simple it’s pretty intuitive there wasn’t a single instance where I
explained it and people said oh I cannot understand it’s pretty straightforward
right this is the syntax what I
did is for every example that we are putting here especially for syntax I’m
putting a link to the documentation because it doesn’t make sense to put all
the different commands and options it’s a lot of stuff this is the basics this
is for a subset of the data sometimes people do that so let’s say you have one
city that you want to put on a materialized view I don’t know the reason I’m not the
developer and the applications not mine usually it’s the other way around it’s
you guys that tell me why you’re doing things on a certain way but it’s a valid
valid use case so here we are making the height a clustering key so now we can
apply restriction or changes of heights and use aggregates or different ordering
and by the way one thing that I did here is you see that the examples are not
consistent so before I was talking about building by country now I’m talking
about building by city is just that I don’t want you guys to get like into the
bad habit of just you know following the example so I’m trying to play a little
bit with them this is the example that. Tzach discussed before for the pet
clinic and you can see that the base table has pet_chip_id and
the partition key and time has the clustering key and then we create a
materialized view and this is going to be in a different order because now we
are making heart rate clustering key and it’s before time so
when you create that materialized view the ordering
will be different even though you kind of have the same data right
okay some limitations we have on materialized
views so there must be a one-to-one correspondence between view rows and
base rows if you are to omit the base table primary key then we
have no way of knowing how they are related right so if you if you’re
creating a materialized view you necessarily have to include the
partition key from the base table and you can use one of the regular columns
as part of your clustering key or your partition key