Kafka and Scylla Lesson Now Available!
Learn More >

Primary Key, Partition Key, Clustering Key – Part One

4 min to complete

Primary Key, Partition Key, Clustering Key – Part One

A Primary Key is defined within a table. It is one or more columns used to identify a row. All tables must include a definition for a Primary Key. For example, the table we previously saw:

CREATE TABLE heartrate_v1 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id)
);

In the example above the primary key is a single column – the pet_chip_id. If a Primary Key is made up of a single column, it is called a Simple Primary Key. 

For the above table perform the query:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23;

What happens if we want to query our data by pet_chip_id but also by time? That is if our query is:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';

In that case, the above query won’t work. We can define the Primary Key to include more than one column, in which case it is called a Composite (or Compound) key. Create the following table:

CREATE TABLE heartrate_v2 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id, time)
);

And insert some data:

INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:05', 100); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:50', 96); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-04-04 07:01:50', 99); 

In such a case the first part of the Primary Key is called the Partition Key (pet_chip_id in the above example) and the second part is called the Clustering Key (time).

A Primary Key is composed of 2 parts:

  • The Partition Key is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns.

  • The Clustering Key is responsible for sorting the rows within the partition. It can be zero or more columns.

Now execute the query we previously saw, according to time:

SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';

Additionally, we previously had an issue with heartrate_v1, where a pet could only have one heart rate value recorded regardless of the time. Now when we defined the time to be a part of the primary key, each primary key, that is a combination of pet_chip_id and time, can have a heart rate value.
Read the data for the same pet:

SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 ;

We can see that as opposed to the previous example, this time the value wasn’t overwritten.

fa-angle-up