Global indexes used to be the only kind that ScyllaDB supported, so it may be known to you already under the name “secondary indexes”. They’re easy to use and conceptually simple – we just pick a column to be indexed, create an index, and voila, it works – we can now query by the name column.
Behind the scenes, on index creation, ScyllaDB creates a materialized view which has the indexed column as its partition key (so we can query by that column), and also primary key columns from the base table.
Querying an index looks simple from the user’s perspective, but there’s some magic involved underneath. After the user requests rows with a condition on the indexed column, the materialized view is queried, so we can extract base primary keys that have the name column value as requested. Then, after receiving all the base primary keys, we can query the base table using these keys. The result is, as we expect, all rows that fulfill given conditions.
To sum up, querying – there are two subqueries involved – first we fetch base primary keys from the underlying materialized view, and then we use them to query the base table directly. The result is exactly what we expect – base rows that fulfill our restrictions
Transcript
Global secondary indexes, these have been in. ScyllaDB
already for a while so some of you may already know it, simply under the
name secondary indexes it used to be the only kind that ScyllaDB supported. They’re
easy to use and conceptually simple when you have a table and want to enable
querying by a different column which is not the primary key we can simply
create an index on that column and we’re able to query by that
what happens underneath is that the smart. ScyllaDB monster will create a
materialized view when we create an index and this materialized view will
first of all store the indexed column as its partition key and thus we will be
able to query by it and also hold the base table primary key so we can
later use it to fetch the row that we’re interested in. Querying a table with a
global index looks simple from user’s perspective but it’s actually pretty
complex on the ScyllaDB side.. We’ll have an occasion to look at some
graphs that show how it looks like, so the first step is when a client sends a
request to one of the coordinator nodes, to a ScyllaDB node and then what
we need to do first is contact materialized view in order to get base
table primary keys so that’s one hop and then after we extract base primary keys
that match our much our query restriction then we can query the base
table itself and return all the appropriate rows to the client. Here’s a
text mode for people who don’t like colorful images, we’ve simply first queried
the materialized view as I told you before and then having the base keys we
were able to fetch all the base rows that we’re interested in and the result is
pretty much what you expected.. For completeness let’s look at how
the write path looks like for a table with indexes. At first
as before a client issues write request insertion into the database, then the
coordinator node sends the the data into the node who is going to
store this particular partition and at the same time as the replicas received
the data to store it also generates a view update which is sent in order to
make sure that the materialized view that we use for indexing is kept in sync
with the base table.