This topic provides an overview of NoSQL and ScyllaDB, including the CAP theorem, where ScyllaDB is in relation to the theorem, the classification of NoSQL databases by the data model, and some key ScyllaDB attributes and design decisions.
We’re at the intro, and I’m
going to start by talking about “NoSQL” in general.
So what we see here is the cap theorem.
The cap theorem states that distributed databases
or NoSQL databases
have three
characteristics or attributes that we would like to strive
for, and they are availability,
consistency and partition tolerance.
However, the cap theorem states that
there is a trade-off between these three attributes,
and we have to choose
two out of those three so we can’t have it all, and
maybe a few words about each one and what does it mean.
So availability means that we will
still be able to serve requests and remain highly available
even if there is some kind of
failure, say, hardware failure, or for any other reason
some of our nodes in the distribute
the database become unavailable.
Consistency, there are different ways
to define consistency, but one that I like and it’s
simple is that for the latest
for a read request, we would always get the latest write.
So our data is consistent,
and partition tolerance means that
because we are talking about the distributed database,
even if there is a network failure or for whatever reason
some of our nodes become unavailable,
the system would still work
and would still be serving requests.
So I placed on this slide
some databases, these are just examples.
There are hundreds of databases.
So you can see that, for example,
Apache HBase and MongoDB prefer
consistency and partition tolerance while ScyllaDB
as well as Apache. Cassandra and Amazon DynamoDB
give a preference
for availability and partition tolerance over consistency.
Now that doesn’t mean that they’re not consistent
and I’m going to talk about that
in just a minute.
Okay, another way to look at NoSQL
databases is according to their data model,
so there are a few different data models that are used.
This is one way to classify them according to the complexity,
starting from the least complex.
So a key value/store
means that we have simply a key and a value.
It’s very simple.
Some examples are Redis and Aerospike.
Of course, there are others.
Next, a bit more complex would be documents store.
And here we have
our data model that’s more structured.
Think about JSON or XML or some other structured document.
And some examples here are MongoDB and Couchbase.
Next, we have
a more complex data model, which is wide column store.
And you can actually think
about the data model as a table with rows and columns.
And here we have ScyllaDB together with Apache Cassandra
and some others.
And finally,
the most complex would be a graph data model
where we have our data modeled
as edges and vertices.
So that’s a graph database.
And some examples are JanusGraph and neo4j.
And by the way, JanusGraph can work with ScyllaDB
as the underlying data storage.
Great.
So I talked about NoSQL in general.
Let’s talk about ScyllaDB specifically
some of the attributes of ScyllaDB.
So high availability,. I mentioned ScyllaDB is built
to remain highly available
even if there are some failures or disasters.
This is done by replicating data.
So for each piece of data that we store, we have multiple
copies and I’ll show you how that works in just a bit.
And that’s also something that can be configured
by the user according to the specific use case.
Highly scalable, so, ScyllaDB
is easy to scale by adding more nodes.
A cluster, again, this is a distributed database.
So it works on multiple nodes and the system scales
by adding more nodes.
Let’s take for example, say we’re
an e-commerce website and Black Friday is coming up.
We know we’re going to have more traffic
so we can simply add more nodes before Black Friday.
We can do that without any downtime
as the system is running.
Add more nodes, be able to serve more requests
on Black Friday, and then maybe after Black Friday we can scale
back down again, removing nodes without any downtime.
Performance, so
there was a question
in the lounge before we started about ScyllaDB sweet spot.
So one of the things that comes to mind
is applications that require high performance.
We’re talking about very low
latencies below 10 milliseconds and very high throughput.
So if your application requires high performance,
then ScyllaDB is probably a good
candidate.
Low maintenance, so ScyllaDB is built with autonomous
capabilities, auto tuning, and it typically also requires
less nodes to run than other databases,
which also makes it easier to maintain.
And last but not least, very importantly, ScyllaDB
is a drop in replacement
for Apache Cassandra and for Amazon DynamoDB.
Meaning that if you have your application
running with either of these two databases, you can
pretty much keep your application as it is
and switch your end point configuration
for your application to run with ScyllaDB instead.
And by doing that, you get quite a lot of benefits
that you see here, better performance, easier to maintain.
And when talking about DynamoDB, also you avoid vendor lock in.
So ScyllaDB can run on different clouds on prem
and so on.