The ScyllaDB Operator allows you to use Kubernetes as a management layer for ScyllaDB.
The lesson starts with a quick intro to ScyllaDB and to Kubernetes.
A stateless application is one that depends on no persistent storage. The only thing your cluster is responsible for is the code, and other static content, being hosted on it.
A stateful application has persistent storage and several other parameters it is supposed to look after in the cluster.
The Operator why do we need it, so first I should say that this is not a Kubernetes training
session or a ScyllaDB training, session, if you need to learn more about Kubernetes and ScyllaDB I
recommend you use all the resources documentation. ScyllaDB University, this session is
basically about bridging both Kubernetes and. ScyllaDB concepts and that’s pretty much what
the Operator does, right? so initially Kubernetes was used for stateless application and for
simplification purposes, let’s say that a stateless application it’s an application
that doesn’t depend on persistent storage, and a stateful application is one that has persistent
storage, of course, we use this term stateless there’s some discussion around the term, because
stateless obligation in theory can have some sort of state
but let’s keep it simple, so stateless no persistence, stateful persistence
ScyllaDBDbB, is a database and databases are by nature stateful, also
ScyllaDB has some architectural decisions like we used c++ to implement our product
we are shard per core, asynchronous, unified cache basically means that we implemented our
own memory management, we are bypassing the linux memory management or the
OS memory management, and implementing that on user space and of course we don’t have a JVM, right
also we have schedulers, we have an I/O scheduler, we have CPU scheduler, etc and we are autonomous
so we struggle to give you the best performance out of the box without any tuning being necessary
for because of all those architectural decisions, what we are looking for with those
decisions is to give you the best performance in a database, NoSQL database specifically, right so
we are trying to be close to the hardware so we can squeeze every resource out of CPU memory
interrupts, etc and because we’re a database, we also have a lot of states to maintain
so in a ScyllaDB cluster your nodes could be up and normal or joining or leaving the cluster, etc they
don’t simply have states but also each member in a cluster has to communicate those
states to the rest of the cluster, so there’s a lot of gossip between all the ScyllaDB nodes and lastly
we have maintenance operations that happen kind of automatically, like compactions for
example and we have support operations, like you’re adding nodes or you’re decommissioning nodes
maybe you’re running repairs to fight entropy and to make sure your data is consistent and usually
those operations that depend on a human Operator, someone has to go to the nodes and run commands
etc or at least create an automation for that, right
so I mentioned that you should be familiar with Kubernetes, so
the most simple you know thing that can exist on a Kubernetes cluster it’s a pod
right, which is basically one or a collection of containers with or without a persistent volume
claim, which is basically a persistence or a disc, so on Kubernetes we have this concept
of stateful sets, because well as I mentioned first Kubernetes was mainly used for stateless but
at some point, people decided they wanted to use stateful applications also in Kubernetes
the characteristics of stateful sets are basically, pod uniqueness, so
pod ordering and network and storage identity, so that ensures that
one pod will be created at a time and they will be unique, they will be ordered and they
will have their own identity, right, that is sufficient for a lot of stateful applications
but because of the reasons I mentioned on the previous slide, it’s not enough for ScyllaDB
so ScyllaDB has the this concept of a ‘seed node’ which is basically the contact point for new
nodes that are joining the cluster, we are also a multi-zone and multi-region, which is
basically translated in ScyllaDB terms, it has racks and datacenters, we have these operations like
scaling down, scaling up and of course in a stateful set, after all the pods are there if
you lose the persistency, so the PVC, which is the persistent volume claim, it’s not going to
be detected by the stateful set that you lost that PVC or if it is detected, it’s going to
be replaced by an empty PVC and for a database that’s problematic, because you need to run
some operations to recover that data, right and of course, a stateful set is not extensible
for our purposes because there’s there are things that ScyllaDB needs, that other stateful applications
don’t need, so it wouldn’t make sense for us to implement those functionalities just for ScyllaDB