4 min to complete
A Primary Key is defined within a table. It is one or more columns used to identify a row. All tables must include a definition for a Primary Key. For example, the table we previously saw:
CREATE TABLE heartrate_v1 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id)
);
In the example above the primary key is a single column – the pet_chip_id. If a Primary Key is made up of a single column, it is called a Simple Primary Key.
For the above table perform the query:
SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23;
What happens if we want to query our data by pet_chip_id but also by time? That is if our query is:
SELECT * from heartrate_v1 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';
In that case, the above query won’t work. We can define the Primary Key to include more than one column, in which case it is called a Composite (or Compound) key. Create the following table:
CREATE TABLE heartrate_v2 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id, time)
);
And insert some data:
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:05', 100);
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:10', 90);
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-03-04 07:01:50', 96);
INSERT INTO heartrate_v2(pet_chip_id, time, heart_rate) VALUES (123e4567-e89b-12d3-a456-426655440b23, '2019-04-04 07:01:50', 99);
In such a case the first part of the Primary Key is called the Partition Key (pet_chip_id in the above example) and the second part is called the Clustering Key (time).
A Primary Key is composed of 2 parts:
- The Partition Key is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns.
- The Clustering Key is responsible for sorting the rows within the partition. It can be zero or more columns.
Now execute the query we previously saw, according to time:
SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 AND time >='2019-03-04 07:01:00' AND time <='2019-03-04 07:02:00';
Additionally, we previously had an issue with heartrate_v1, where a pet could only have one heart rate value recorded regardless of the time. Now when we defined the time to be a part of the primary key, each primary key, that is a combination of pet_chip_id and time, can have a heart rate value.
Read the data for the same pet:
SELECT * from heartrate_v2 WHERE pet_chip_id = 123e4567-e89b-12d3-a456-426655440b23 ;
We can see that as opposed to the previous example, this time the value wasn’t overwritten.