Primary Key, Partition Key, Clustering Key – Part One

4 min to complete

Primary Key, Partition Key, Clustering Key – Part One

A Primary Key is defined within a table. It is one or more columns used to identify a row. All tables must include a definition for a Primary Key. For example, the table we previously saw:

CREATE TABLE heartrate_v1 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id)
);

In the example above the primary key is a single column – the pet_chip_id. If a Primary Key is made up of a single column, it is called a Simple Primary Key. 

For the above table perform the query:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23;

What happens if we want to query our data by pet_chip_id but also by time? That is if our query is:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';

In that case, the above query won’t work. We can define the Primary Key to include more than one column, in which case it is called a Composite (or Compound) key. Create the following table:

CREATE TABLE heartrate_v2 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id, time)
);

And insert some data:

INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:05', 100); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:50', 96); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-04-04 07:01:50', 99); 

In such a case the first part of the Primary Key is called the Partition Key (pet_chip_id in the above example) and the second part is called the Clustering Key (time).

  • A Primary Key is composed of 2 parts:
  • The Partition Key is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns.

  • The Clustering Key is responsible for sorting the rows within the partition. It can be zero or more columns.

Now execute the query we previously saw, according to time:

SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';
To report this post you need to login first.