A guideline on what to use and when.
A rule of thumb:
1. For no storage overhead, filtering is the only option
2. For highly selective queries, indexing may be more beneficial than filtering
3. If most of the queries specify a single partition key:
* a local index is faster than a global index, especially with token aware load balancing policy
* filtering a single partition may be faster than an index, depending on the partition size
4. If most of the queries are multi-partition, local indexes won’t help:
* a global index may be beneficial, especially if query selectivity is expected to be high
* filtering may be better if the query selectivity is low (i.e. the result consists of most of the rows)
One could often see that filtering is just as fast (or even way faster) for reading as indexes, especially if we’re interested in the majority of the results. For example, if the query is going to return 60% of all rows of a table anyway, or if we just query a single partition and it’s small, filtering can get orders of magnitude faster than an index, because it simply reads all the data sequentially instead of a series of random reads that indexes need.
Transcript
Keep in mind that materialized views, indexes, filtering they are
not silver bullets I talked about this in the beginning let me talk again
the first design principle in NOSQL is start from your queries and
then design your data model the reason and you know trying to emphasize this is
because let let me give you a horror story from the
field so we have a customer in Brazil someone from a vendor or something got
access to the database and they were trying to to create some reports so what
the some brilliant guy there did was to create a materialized view which
the country code was the partition key and they had like 40 different countries
and then you can imagine what it did to the partitions on that materialized view
right, and then it was impacting production that guy shouldn’t even have
access to the database but it was horrible we suffered for like three days
until thanks to our large partition detector in this low query
detector we were able to pinpoint the. IP from the machine and they someone
paid a visit to a guy I don’t think it was pleasant but to keep in mind ok this
is a convenience I want to use global index local secondary index materialized
views keep in mind that it’s the same design principles if you don’t have
enough cardinality in your partition key you have a problem if your query is not
selective enough you’re going to have a problem ok so keep in mind do I have
enough cardinality and selectivity how big are my partitions going to be how
many rows and how could I design my data model to be more efficient for my
queries so these are some principles based on what I just mentioned you know
so always keep in mind cardinality and selectivity I think this this is going
to be more clear when you see Sarna’s presentation about performance he will
show you that sometimes it’s okay to use allow filtering and you don’t need a
secondary index depending on how you’re querying but I don’t want to
give more spoilers on his presentation