A walkthrough of the Basic Data Modeling Lab – part 1. You can find and run the lab in the Basic Data Modeling Hands-on Lab topic.
Let me
start with the lab.
In this lab,. I’m going to be using
the “Kllrcoda” platform like I did in the previous talk.
For those that were not there. Kllrcoda allows me to
run a virtual machine directly from my browser.
And it’s useful for labs because there are no prerequisites,
it doesn’t matter which operating system or platform
I’m using, and it’s very straightforward to run.
So I’m going to be running the basic data modeling lab.
I would like you to
see what I’m doing.
And after
the event today, you’re going to get a link to this lab
and you’re going to be able to run it and play around
with it on your own.
Great.
So before I started talking,. I ran three commands
just to save us some time.
And I ran this command “docker run”
with the name of the node and the version of Scylla.
So I’m running a cluster with three different nodes.
And once I clicked that command ScyllaDB
was downloaded and the node was started.
And then. I ran two more commands with
two other docker commands to start
two other nodes,. ScyllaY and ScyllaZ.
And I provided the IP of the first node
just so the nodes can find each other and connect.
So I did that.
When you do it on your own,
it might take a few seconds until the cluster starts.
And then you can run the “Nodetool Status” command. To
run a command in this platform. I just click on it,
so it’s quite easy.
And this command shows me
as it sounds, what’s the status of my cluster.
So I can see that in this cluster I have three nodes,
I can see the IP
addresses of the nodes and I can see that the nodes
have a status of U.N., and that stands for “up
and running”.
It could be in joining status or down or different
other statuses, but in this case,
my cluster is up and running, it’s ready to serve requests.
And what I’m going to do is
I’m going to connect to one of the nodes.
I could connect to any one of the three nodes.
They can all serve requests, but I choose in this example
to connect to ScyllaU, and. I do that using the CQL shell
So if you remember, the CQL shell allows me to
interact with my cluster and perform
some queries.
Now I’m connected to my node with a CQL shell
and I’m going to create a keyspace, call it “key_example”.
And the important thing to notice here is the replication
factor of three.
Okay, so again, replication factor of three means that
each piece of data that we have is replicated
three times to three different nodes.
Okay.
And using this keyspace, I’m going to create a table, call it
“heartrate_v1”, which is the one we saw before in the slides.
It has three columns: pet_chip_id, time
and heart_rate, and the primary key is defined – sorry?
it’s defined as the pet_chip_id.
Okay.
In this case, if you
remember, the primary key
has to have at least one column.
It can have more.
And in this case, it has one column,
which is the partition key.
So the primary key can have two parts: the partition key
and the clustering key.
The partition key is mandatory.
The clustering key is not.
So we need to define at least one column
for the partition key.
And that’s how Scylla knows which node is responsible
for which row.
Great.
So now we have this table defined
and we can insert some data, so I’m going to
insert one row and you can see very straightforward
I write the column names and the values,
and then I can simply read my data.
As you would expect. I select the data
and I provide the partition key,. I provide the pet_chip_id
and because I provide the partition key,
what happens under the hood. ScyllaDB performs
the consistent hash function on the value of the partition
key pet_chip_id in our case, and it knows which
node is responsible – node or nodes
because it’s replicated three times,
which nodes are responsible for this data and it can very,
in a very efficient way, know where that data is
and find it.
Great.
So let’s say
we want to add another value for the same pet.
So I’m going to
insert
I’m going to insert another row to my table
and this time I provide the same partition key,
the same pet_chip_id as before, but I provide
a different heart rate value for this pet.
Now, doing that,
if I read the data,
what do you think would happen now?
Would this query be successful and would this
additional heart_rate value be added to our table?
If you can quickly write in the chat window
what you expect will happen.
So again, what I did here was I created the table;
it has a partition key of pet_chip_id; I wrote one row
and then I tried
to write another row with the same pet_chip_id.
Will the query be successful?
would it fail?
Would the row be added to the pet_chip_id?
What would happen?
If you can
quickly write that in the chat window..
So I see some people think it’s going to fail.
Others think it would have two rows.
Michael thinks it’s going to be
an upsert, successful fetch.
Okay.
So different opinions, replace.
So some of you are right.
If we read the data right now,
we can see that it actually was an upsert.
So our data
writing the new row actually overwrote the previous data.
So it was an update.
It overwrote the first value.
And this happened because of the way the table was defined.
There was a single partition key,
and if we write
with the same partition key,
we’re simply going to overwrite that data.
And by the way, this happens in ScyllaDB,
and it’s the same in Apache Cassandra
and in other databases, inserts are actually upserts.
So, some of you wrote that, and you are correct.
But, how can we overcome this issue?
Say we have an application
like we defined here and we want to have
multiple values for each pet, right?
Each pet, I said in the example,
we have a sensor for our dog and it writes the heart rate
every second.
So we need to have multiple values.
It doesn’t make sense
in this application to overwrite the values.