Join us at ScyllaDB University Live, instructor-led, online, hands-on, training sessions | September 24
Register now

Primary Key, Partition Key, Clustering Key – Part Two

6 min to complete

Primary Key, Partition Key, Clustering Key – Part Two

 

As we just saw, both the Partition Key and the Clustering Key can include more than one column, for example, if our query is:

SELECT * from heartrate_v3 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time ='2019-03-04 07:01:00' AND pet_name = 'Duke';

We could define the table as follows:

CREATE TABLE heartrate_v3 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   pet_name text,
   PRIMARY KEY ((pet_chip_id, time), pet_name)
);

Create the above table.

And insert some data:

INSERT INTO heartrate_v3(pet_chip_id, time, heart_rate, pet_name) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90, 'Duke'); 

In this case, the partition key includes two columns: pet_chip_id and time and the clustering key is pet_name. Keep in mind that every query must include all columns defined in the partition key.

Now try to execute this query:

SELECT * from heartrate_v3 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23;

It fails, as the entire partition key is not given.

Now try this query:

SELECT * from heartrate_v3 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time ='2019-03-04 07:01:10' AND pet_name = 'Duke'; 

It succeeds as the complete partition key is given.

Similarly, if we want each partition to be based on the pet_chip_id but to be able to query according to pet_name and heart_rate:

SELECT * from heartrate_v4 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND pet_name = 'Duke' AND heart_rate = 100;

it is possible to define (do this):

CREATE TABLE heartrate_v4 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   pet_name text,
   PRIMARY KEY (pet_chip_id, pet_name, heart_rate)
);
  • If there is more than one column in the Clustering Key (pet_name and heart_rate in the example above), the order of these columns defines the clustering order. For a given partition, all the rows are physically ordered inside ScyllaDB by the clustering order. This order determines what select query you can efficiently run on this partition.
  • In this example, the ordering is first by pet_name and then by heart_rate.
  • In addition to the Partition Key columns, a query may include the Clustering Key. If it does include the Clustering Key columns, the columns must be provided according to the order in which they were defined. That means that a clustering column cannot be “skipped.” In the above example, it’s not possible to query by heart_rate only, without pet_name. However, it is possible to query by pet_name only, without heart_rate. 

Try this query:


SELECT * from heartrate_v4 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND heart_rate = 90;

It fails, as pet_name comes before heart_rate in the clustering key. If the query uses heart_rate it must also use pet_name.

Insert some data into the table:

INSERT INTO heartrate_v4(pet_chip_id, time, heart_rate, pet_name) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90, 'Duke'); 

And now execute:

SELECT * from heartrate_v4 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND pet_name = 'Duke' AND heart_rate = 90;

This succeeds this time, as both clustering columns are used in the query.

By default, sorting is based on the natural (ASC) order of the clustering columns. What happens if we want to reverse the order? What if our query is to find the heart rate by pet_chip_id and time, but we want to look at the ten most recent records? The query would be:

SELECT * from heartrate_v5 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 LIMIT 10;

And we’d define the table to sort the data in reverse order. To do so:

CREATE TABLE heartrate_v5 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id, time)
   )WITH CLUSTERING ORDER BY (time DESC);

More info about this is here.

What happens if we try to create a Table without a partition key?

CREATE TABLE heartrate_v6 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int
);

We get an error:

InvalidRequest: Error from server: code=2200 [Invalid query] message=”No PRIMARY KEY specified (exactly one required)”

Let’s create the table we used as an example above:

CREATE TABLE heartrate_v7 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   pet_name text,
   PRIMARY KEY ((pet_chip_id, time), pet_name, heart_rate)
);

Now, let’s insert some data into the new table:

INSERT INTO heartrate_v7(pet_chip_id, time, heart_rate, pet_name) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:00', 100,'Duke' );
INSERT INTO heartrate_v7(pet_chip_id, time, heart_rate, pet_name) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:05', 103,'Duke' );

In the following quiz, you will have a chance to run some queries on this table.

fa-angle-up