Filtering: What is it and why is it useful? Sometimes we want to be able to query by different columns, but we’re not interested in creating secondary indexes. Filtering is one more way of allowing such queries. The mechanism is really simple – the coordinator will fetch *all* of the results specified by the key restrictions, and then filter out rows that do not match the rest of the restrictions. But, there’s a caveat. Filtering can be very performance-heavy, it can even result in fetching all rows from the table and then filter out just a few rows. Because of that, queries that involve filtering must be explicitly allowed to do so, by appending ALLOW FILTERING keyword to each such query.
Filtering may put a strain on the performance of the whole cluster by reading the whole table.
The session also covers combining filtering with indexes and includes examples.
Transcript
okay that’s an error that I mentioned before, everybody who ever
tried querying for something ordered in a primary key saw this error at least
once, right?
so filtering so query restrictions that may need filtering non key fields just
as a previous example parts of primary key there are not prefixes partition
keys with something other than with equality relations so when you’re trying
to you know do ranges and stuff and clustering keys before range restriction
and then by other conditions our implementation it’s called
coordinator side filtering it retrieves all has if restrictions do not exist and
then the results are filtered and the mismatch rows are dropped what what do
I mean by that so let’s say it didn’t apply any restrictions on primary key
for example what is is it going to do it’s going to read the entire table but
if you apply the restriction to primary keys let’s say you’re using an in clause
with you know ten different partition keys it’s going to fetch those ones and
then it’s going to apply the other conditions so if you’re restricting by
the partition key in some other condition it’s going to do that but
first it’s going to fetch everything if you didn’t put a good restriction
okay so in this example we have restriction by clustering key and by two
other columns right so first it’s going to get all the rows using c2 and then it’s
going to filter the rows that match the. V 1 and V 2 equality so let me just go
here and this is where I think it gets interesting because and this is kind of
the hard part for me and that’s the reason I put some exercises about that
to you but here’s how we do let’s say you have one base table you have two
materialized views that you created you have two global indexes and two local
indexes all on the same base table okay when are you going to query the
materialized view anyone when you query the materialized view if you if you query the
base table you’re never going to the materialized view so that’s the first
rule the second rule is if if you have a partition key it’s going to try first
the local secondary indexes and out of multiple local secondary index it’s
going to pick the first one in your schema so let’s say you for you
created two local secondary index you know local secondary index a local
secondary index B if you can have values for both of them it’s always going to A