5 min to complete
What happens if our main query for the pet’s heart rate is by pet and time? We could then define the table as follows (we already created this table):
CREATE TABLE heartrate_v2 ( pet_chip_id uuid, time timestamp, heart_rate int, PRIMARY KEY (pet_chip_id, time) );
In this case, the partition key is the pet_chip_id, and the clustering key is the time. Each pet_chip_id (the partition key) would be a partition, that is, existing on a single node. Within that partition, the data would be stored according to the clustering key, which is “time.”
After adding some data:
Since a Partition Key and a Clustering Key are defined, each partition would possibly have more than one row. A row would be defined by the Partition Key and the Clustering Key. A single partition may have many rows, which would be sorted according to the Clustering Key:
Now it would be very efficient to query what was the heart rate of a specific pet at a given time (or time interval). Execute the query:
SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';
So what happens here:
The partition key (pet_chip_id) tells us exactly in which node our data is stored. So we’d know where in the cluster we can find our data. Now, In that partition, since the data is ordered according to the time, we could very quickly scan to the given time interval (the clustering key) and return the records.
Let’s see what happens if we don’t define a partition key. Recall the previously created table heartrate_v1:
CREATE TABLE heartrate_v1 ( pet_chip_id uuid, time timestamp, heart_rate int, PRIMARY KEY (pet_chip_id) );
This table only has the pet_chip_id as the partition key and no clustering key.
Execute the same query, on table heartrate_v1:
SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';
This query fails with the message:
InvalidRequest: Error from server: code=2200 [Invalid query] message=”Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING”
As we saw before, since the clustering key is not defined, performing this query would be inefficient and possibly very slow.
We would still know on which partition the data exists, because we have the partition key, but we would have to do a linear scan of the entire data on that partition to find the specific row we are looking for.