In this lesson you’ll learn about ScyllaDB cluster management and administration.
It covers the ScyllaDB Manager: an overview, configuration, deployment, health check, adding a cluster, and more.
So before we get to the manager itself again this
is something you’re in advance track so this is probably something you already
know quite well but we need to cover the rationale for having all this in place
so we need to do repairs this is what the manager does this is the main
function of the manager right now and the reason we have repairs is pretty
simple there is the cap theorem in place which says you can have consistency
availability or partition tolerance and pick any two so a typical SQL will have
good consistency and availability but no partition tolerance unless you have some
plugins for sharding and play with stuff like Galera and but then again you have
your own problems there databases like. ScyllaDB cassandra and a lot of the others
in the same category basically don’t stick to consistency too much in order
to have good partition tolerance and availability that means that it doesn’t
mean that we don’t have consistency at all but it means that consistency is
eventual and not immediate and it’s tunable so you can pick exactly what
kind of consistency you expect so this is the CL parameter you issue with the
queries so per query you can decide whether you want it to have high
consistency or just good enough consistency or no consistency at all if
you just do a CL one so what happens is during repair is data is compared
across all the replicas for that and you need to have replicas if you
want to have multi master cluster and whenever there is an inconsistency it
replicates the whatever is the newest record it pushes that to all the other
nodes and overwrites the whatever is existing there so basically the most
recent record always wins and every insert in ScyllaDB happens with a time
stamp which is something you can view but it’s not regular query so
when you issue a regular select you will not see it but everything has a time
stamp every row so the row would
the highest time timestamp is the one that gets replicated during repair so
how we can do those repairs how we can reach consistency so at write time
if we cannot write all the replicas at once there is sent at handoff which
basically means the other nodes will store the replicas that are supposed to
be sent to the node that wasn’t able to answer on time and they will stream that
whenever the node becomes available this is a good mechanism but it’s limited in
terms of how much they can hold so at during read time there is the read
repairs whenever we read and we want to have a good consistency so we read with
consistency three we get two answers with one row and the third comes in from
a node with a replica but that one is older so we will return the answer from
the two nodes that have the newer one and then we will also fix the one on the
third node and there is the the maintenance action that is a full repair
which is exactly what the manager does it can be done at the whole cluster
level it can be done at smaller levels we’ll cover that but this is something
that is part of the maintenance just like backups that databases like ScyllaDB
and Cassandra and others with lower consistency require okay so the
ScyllaDB manager it basically has two components okay
there is the manager itself it’s a daemon that exposes a REST API and there is the
tool to to talk to the daemon so you can of course issue your own REST API calls
but there is a command called SC tool that talks to the manager from the CLI
it’s centrally managed usually it’s installed together with the monitoring
on the same node just because the monitoring machine is already has access
to all the scylla nodes and because nobody wants to designate an entire
server just to do that just to do the manager but it’s up to you it is created
specifically for ScyllaDB so it will probably not work for anything else but
the point is it’s shard aware so the ScyllaDB’s
shard aware architecture is taken into
account by the manager so it can be made highly available and it is stateless so
it uses a ScyllaDB database as a back-end which it installs locally or it can
connect to an existing ScyllaDB database obviously it’s not a good idea to use it
with the same database it manages so you either install another cluster if you
want it’s highly available or just keep it to the local single replica small
setup which is quite common it’s not a production database after all and yeah
you can use any ScyllaDB cluster but like. I said you don’t want to create the
chicken-and-egg situation where manager is managing the database that its uses
for its own back-end okay right now the current version of manager uses SSH to
leverage API access because most of the setups for ScyllaDB do not expose the API
so the API is listening on a local host port and when you run nodetool you can
run it locally but typically you cannot run it against another host because it
will just not be able to reach so what the manager does is it
does not use nodetool directly it uses the API it uses the rest endpoint
that nodetool connects to but it simply SSHs into the host and then
runs the commands locally talks to the. API locally of course if you want to
expose the API outside that’s not best practice for security
purposes because it’s not secured but it is possible and then the manager can
just talk to it directly so like I said it just connects over SSH so the
connection is secure and it deals with all the establishment of those
connections the certificate distribution all that there are scripts inside
manager to do that and the database is used to do that is the the current
latest enterprise ok just one thing I forgot to mention here
is that remember there is a dashboard for ScyllaDB manager so the monitoring is
integrated with the manager so you can go into that dashboard and if you have
the manager also configured you’ll be able to see the progress of the tasks
that is currently running and everything that it does one more thing so as you
can see in the future manager 2.0 will be dropping the SSH part because it’s
proven to be a bit too complex to set up especially in secure environments where
you have your SSH daemon really locked down tight and unable to log in without
Kerberos passwords entered every time for example we took that into account
when we were planning for the next version and this is one of the main pain
points with lots of customers that they had these problems because of their
specific internal ssh configurations so the next version will simply have an
agent running on the ScyllaDB node and the the agent would sorry the manager will
be talking to that agent instead of trying to establish SSH so that
will be a secure encrypted connection but it will be you will
need to have the agent running on every node just like we already have node
exporter running there so that’s pretty standard so the deployment itself for
the manager once you install the RPMs you just run ScyllaDB manager setup it’s
a tiny little script that asks you just a couple of questions and then you run
ScyllaDB manager SSH setup give it the current user that has access to the
nodes and give it the path to the key and which user to create the manager
user that will be used for SSH and the key to push for that user and then we’ll
give it a single IP of the node it will go into that node with SSH using the
user in this case centos and discover the cluster using nodetool pull out all the
data reach out to all the other nodes and establish connectivity everywhere
create this user and the key to authorize users authorized keys etc
then it will just tell you that it connected to all of these lists of nodes
okay and this is what happens is when I want to add a cluster so sctool is the
command already mention it so cluster add give it one host no need to talk no
need to give it a list of hosts it will discover the rest give it a name this
name does not necessarily have to be the cluster name within ScyllaDB
this is just something to designate it inside the manager database it will give
it a UUID anyway so you’ll be using probably that UUID more often than the
name and that’s it if you have a. SSH in place and you didn’t expose
the API then you also need to specify the SSH parameters and that’s it so in
the future SSH will just become redundant in this particular step but
right now we still use it okay so if you run the status after that
you’ll get a nice little table printed out to to your screen with
the status of the cluster okay and this is what the the monitoring looks like
for the manager so you will have your. Prometheus pulling data from the manager
as well and it will show the dashboard that contains let me just
open it here oh it does not open anyway so as you can see here it shows us how many
nodes there are and the current repair progress going on there and also it
shows the the cql health check so it will keep pinging the cql port and
making sure that it’s available on all the nodes and that’s the next thing I’m
going to talk about well just soon enough so this is basically the
types of deployment we have available so we have binaries for all
the standard common distributions and we also have it on Docker
so you can install it directly from. Docker but that means you will need to
install ScyllaDB and connect it to it so link the two containers if you install
ScyllaDB on Docker and this on Docker and the manager on Docker you just link them
together and they will work ok this presentation can be found on the
University website once you log into. ScyllaDB University in here if you follow
this gist it will guide you through setting up a full ScyllaDB
set up the monitoring that works with it
directly and how to set up the manager to manage this cluster so you will have
three ScyllaDB nodes 3 container 3 containers for monitoring and two more
one ScyllaDB and one manager running on your machine and pretty much presenting
the entire stack
there are rpms there are Deb’s for Debian and Ubuntu
and there is a docker on docker hub okay so the next version like I already said
will have node side agents so we will not have to deal with SSH anymore the
manager will also do backups so it will be dealing it will be using our what was
it called our sync not our sync block-level sorry
yeah our clone that’s the one they gave a big presentation a few months ago on
it in Poland and if you have any requests for additional features you
feel free to open them and we’re always happy to get more so basically right now
the main reason for the manager is repairs and what we really care about
are those maintenance repairs that need to happen all the time
so first of all like I said every row has a timestamp and the newest one wins
so another thing to know about ScyllaDB and I’m pretty sure it’s been mentioned
before it’s an advance track is the fact that we don’t delete data directly we
replace the data with the tombstone when we issue a delete on the row so and then
garbage collection which is a periodic process walks over all those tombstones
and drops that data okay and one of the reasons we really need to make sure we
do those maintenance repairs is this is one of the problems there are more but
this is one of the more obvious and common ones it’s basically when we issue
a delete it gets propagated to all the relevant replicas and if for example one
of the nodes during that propagation was not
available it doesn’t necessarily need to be down but I don’t know a network outage
it just I don’t know was overloaded at the time and did not respond to it not
yet whatever it wasn’t available so what happens is we have out of three replicas
we have two where we have a tombstone and a third where nothing happened later
on garbage collection runs through that and that tombstone is removed we don’t
have it anymore and suddenly that node that wasn’t available comes back online
we should read most of the nodes don’t have that row anymore at all so there
is no newer record with a newer time stamp that says it’s not here but the
the third node actually has it so it responds and it has a time stamp which
is newer than nothing and we end up having this record propagated back with
the repair to all the other nodes so that means we need to really keep the
repair period lower than the garbage collection to avoid this situation
because if our garbage collection happens less often than repairs this
cannot happen this makes sense it’s kind of a race condition I guess but this is
why typically the default garbage collection is set to ten days you can
play with it that’s called gc_grace_seconds in scylla.yaml so we do repairs on a
weekly basis by default so then it’s three days period that’s different okay
so we don’t really need the manager to do repairs because we’ve had repair as
long before we issued the manager otherwise we wouldn’t be able to operate
right so we could just do nodetool repair and the problem with nodetool repair is you
need
to run it on each host specifically it’s an ad hoc operation it’s not monitored
you just issue it and pray for the best
there are no records for it there is no there is nothing you just issue it and
you wait for it to somehow run through
that’s it it is not shard aware and there are no retries if it failed you just
start over okay at the manager level when you do a repair you can do it
globally per cluster then you can also drill down to specific parts of your
cluster it is it can be scheduled so of course with the nodetool you can also do
cron jobs that sort of thing but again you need to manage it somehow here you
do it through the manager in one central location you can stop and start and
repair you can pause it if the repair is very big and it’s running for a long
time and you expect to have like a very serious load on the cluster right now
you can just pause the repair until the load goes away and then continue it’s
better to rerun it if there are lots of inserts at that point but still it’s
possible the history is kept so you know exactly how long they took those repairs
what succeeded what didn’t etc it’s shard aware so it’s done specifically
for ScyllaDB so it’s aware of the architecture and you can play with the
rules for the retries I’m not going to go into that here but it’s a it has lots
of bells and whistles around it so the jobs this is the the basic granularity
of the jobs but it’s it can get pretty complex and there is even more that you
can do with it so the simple repair you just specify the cluster this is
something that you do instead of going to every node with nodetool you can do a
recurrent repair so you can schedule it there are several ways to schedule it
you can do a per region repair for example so you just only specify DC and
you can you can even do only specific keyspace or even just a table that you
want to repair that is probably you know you know that one keyspace is getting
hit on a weekend and the other only on. Wednesdays so you schedule the repairs to
other days it makes sense and it also supports
wildcards so you can do a sctool repair and then like a star for everything except
for us-east-1 for example in my example at the bottom here and this
needs to be in quotes actually because otherwise Linux will not interpret it
correctly but the manager supports it and then this syntax also applies to keyspaces
tables the these wildcards can work with anything and then you can just
specify specific codes where you want to repair or even just specific token
ranges you want to repair but again you need to be very careful with these
things because you don’t want to get into those zombie problems resurrections
and all of that stuff we we’re not there yet
so you need to make sure that your entire database gets repaired often
enough to avoid these problems okay the other thing the manager does is the
CQL health check I already mentioned it before but basically it’s just
another monitoring health checking way we apply through the manager it will
monitor on every node whether the cql port is available and open and up and
running and it even keeps a graph of its response time so that’s another
indicator of your current load actually so it’s built on manager
so it’s already aware of the cluster topology you don’t need to define it
like you would in the monitoring stack and it’s pretty lightweight so the
connection is don’t need to go like your typical for example Cassandra stress
when you run it it will go to one node discover the entire cluster and start
work working against it here it will only connect this single host without
loading anything else and without clearing the system tables it’s secure
it does not actually log into CQL it only
queries it and it reports both its availability and the latency of the
response so that was pretty much it