Primary Key, Partition Key, Clustering Key – Part One

4 min to complete

S201: Data Modeling and Application Development Basic Data Modeling Primary Key, Partition Key, Clustering Key – Part One

A Primary Key is defined within a table. It is one or more columns used to identify a row. All tables must include a definition for a Primary Key. For example, the table we previously saw:

CREATE TABLE heartrate_v1 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id)
);

In the example above the primary key is a single column – the pet_chip_id. If a Primary Key is made up of a single column, it is called a Simple Primary Key.

For the above table perform the query:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23;

What happens if we want to query our data by pet_chip_id but also by time? That is if our query is:

SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';

In that case, the above query won’t work. We can define the Primary Key to include more than one column, in which case it is called a Composite (or Compound) key. Create the following table:

CREATE TABLE heartrate_v2 (
   pet_chip_id uuid,
   time timestamp,
   heart_rate int,
   PRIMARY KEY (pet_chip_id, time)
);

And insert some data:

INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:05', 100); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:50', 96); 
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-04-04 07:01:50', 99);

In such a case the first part of the Primary Key is called the Partition Key (pet_chip_id in the above example) and the second part is called the Clustering Key (time).

A Primary Key is composed of 2 parts:

The Partition Key is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns.

The Clustering Key is responsible for sorting the rows within the partition. It can be zero or more columns.

Now execute the query we previously saw, according to time:

SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';

Additionally, we previously had an issue with heartrate_v1, where a pet could only have one heart rate value recorded regardless of the time. Now when we defined the time to be a part of the primary key, each primary key, that is a combination of pet_chip_id and time, can have a heart rate value.
Read the data for the same pet:

SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 ;

We can see that as opposed to the previous example, this time the value wasn’t overwritten.

Quizzes

Primary Key, Partition Key, Clustering Key Quiz One

Previous Topic

Back to Lesson

Next Quiz

Primary Key, Partition Key, Clustering Key – Part One

4 min to complete

About

Resources

Documentation

Keep in Touch