An overview of ScyllaDB Manager, what is it, and how it can be used to improve cluster management. ScyllaDB Manager is a centralized cluster administration and recurrent tasks automation tool. It can be used to schedule tasks such as repairs and backups.
Notice that since Manager 2.0 has been released, ScyllaDB Manager does not use SSH to directly communicate with the node, rather it uses the Manager Agent. Learn more about it here.
Let me talk about the ScyllaDB manager then because a lot of
questions are popping up I don’t know what were yours is but I’ll just quickly
explain what the ScyllaDB manager is I’m not gonna go into all of these slides and
because that helps the things about repair too so the ScyllaDB manager what it
is is a tool to manage ScyllaDB clusters in the same tool you can manage multiple
ScyllaDB clusters some people decide not to do that and have one manager per ScyllaDB
cluster maybe because you have different teams in your organization it doesn’t
matter but you can in a single ScyllaDB manager manage a lot of ScyllaDB clusters
if you have less fewer than five nodes.
ScyllaDB manager is free you can download it use it for free if you have more than five
nodes then it’s an enterprise only subscription by enterprise only it does work
with ScyllaDB open source if you choose to use it for whatever reason but it’s
not free for you to use so essentially the ScyllaDB manager what it
does when you kick out a repair on ScyllaDB you need to go
in each individual node first of all and trigger repair from that node and a cluster
wide repair means you cycle through the entire cluster and the repair
is sent as one block said repaired everything that’s in this node what
the ScyllaDB manager does is first of all breaks it into small pieces it chooses
which of those small pieces to repair based on ScyllaDB shard aware architecture
so we will say oh this piece belongs to this shard doesn’t cross
shards so we know repair this piece and then if this piece ended I will
repair another piece that also belongs to this shard to keep utilization
equal if you don’t take this into account you can still repair but your
utilization is not gonna be equal among shard so it’s much it’s more efficient
to run the repairs its resumable because if the node if there is
any issue as I said you want to stop the repairs for whatever reason you you stop
from this point because you know we track it in a database so essentially
what happens is that you have a ScyllaDB database that is tracking which
ranges have I repaired already which ranges are there to repair which
shards those ranges belongs to and manages the process of firing those small
repairs so we usually you can control if your repairs are going faster or
slower by how many ranges you repaired at the same time you can resume them
you can stop it and they’re gonna be running as you asked in the background
all the time and once you do that you essentially add a
cluster to the manager so it’s as I said the manager operates
with the granularity of a cluster you can add one two
three and then you have tasks inside that and you essentially said you have the
repair tasks that come by default you can delete the tasks you can create new
tasks change the frequency in which the task happens but once you register the
repair task it will happen automatically ever every let’s say five day
seven days however much you want the ScyllaDB monitor does manage the monitor
does manage as well a little bit so you can see in our monitoring
if you have failed segments you can see the progress of repair you can
see for instance that this repair is so you can also track in our monitoring solution
if you’re using the ScyllaDB manager what’s the progress of your repairs
again as I said it is available for free for every ScyllaDB user to use as long
as you have at most five nodes if you have more than five nodes then it’s part
of our enterprise offering and give it a try and it’s available for a CentOS.
Ubuntu as well even if you give it a try today people in the other room will be happy
to help you just this is really just a summary table of contents that is about
what I just said repairs in nodetool repair you can do it but
they’re node local you need to go and and maybe gonna use cron maybe gonna start
having issues that one repair hasn’t finished yet in one node and the other
already started and it becomes hard to distribute the manager repair is global
it looks at the cluster as a whole divided in small ranges and goes through
them nodetool repair is an operation that you somehow have to go and do it the
managers are recurrent as Tomer said nodetool repairs fire and forget you fire
it and it’s a big operation if it fails you have to do it all again the manager repairs
because it’s based on tokens you might fail this range but the
whole operation is preserved no records kept if you do nodetool
repair for the manager repair can actually go later and see how the repair
was done you keep history for that it’s all kept in ScyllaDB nodetool repair
is not aware of our architecture so it doesn’t achieve equal utilization among
units while the repair from the manager is shard aware nodetool repair
the retry is you go and retry there is a retry policy for manager repair
so overall I mean as I said it’s better it’s a better repair tool on top of
everything else