An explanation of some basic Architecture, including Node, Node Ring also called a Cluster, the Primary Key and Partition Key.
Okay, so I talked about different,
I showed you the CQL shell and I’m talking about nodes,
and the cluster
and the keyspace and the table, but I didn’t really
say what these things are, so let’s cover that.
In this part,. I’m going to talk about the
basic architecture of Scylla.
So I keep saying “node”.
What’s a node?
A node is the ScyllaDB software
running on a machine on a server.
Typically in a production cluster,
a cluster is a collection of nodes
and in the production cluster we have
at least three nodes and it can go all the way
up to hundreds of nodes running in the same cluster.
So it’s a distributed database that has multiple nodes,
and each node is responsible
for a part of the database content.
Data is replicated
typically to at least three nodes,
but that’s up to the user and the use case.
And each node holds a part of the content of the database.
In ScyllaDB all the nodes are created equal,
so we don’t have the concept of leader,
follower nodes or master slave or any other name
you want to call it, all nodes are equal.
All nodes can serve requests,
and that goes together with high availability,
meaning that there’s no single point of failure,
even if one of the nodes goes down
or more than one node goes down for whatever reason,
say there is a hardware failure
or an earthquake or a fire or whatever,
the system will still be up and running and it will be able
to serve requests. Great.
So that was a node.. What’s a cluster?
A cluster is a collection of nodes.
We can think about it as a “node ring” as you see here.
And here we have five nodes consisting of a single cluster
so they work together and they are a single cluster.
The communication between
the nodes in the cluster is done peer to peer,
so they can all communicate with one another.
And again, there is no single point of failure.
Okay, this is important, the partition key.
How does Scylla know
which node is responsible for the data?
And I’ll say this again, what you see here is also
relevant for Apache Cassandra.
It works at this level pretty much the same.
So we have here a row with four columns.
They are: ID, name, address and phone,
and in this specific table, the ID was defined
as the partition key.
Now choosing the partition key is important
in terms of the data
model of your application and I’ll go into more detail
about that in the next talk, ScyllaDB
core concepts, which is in the essentials track
right after this one.
So what happens?
when we try to query, say we try to write this row
to the database, what ScyllaDB does is
it executes a consistent hash function also called the
partitioner, on the partition key value ID, in this case.
And according to the hash, it knows
which nodes are responsible for this specific piece of data.
Okay. So
it depends on how replication is set up.
It could be more than one node,
but whenever we read or write data,
the system performs the hash function on the partition key
and it knows which nodes are responsible.