This covers an overview of ScyllaDB Manager, a comparison of using ScyllaDB Manager for repair as opposed to using Nodetool, and the concepts of repair intensity and speed.
So, let’s continue with cluster repair using ScyllaDB manager
let’s first have a short overview of ScyllaDB manager, ScyllaDB manager automates database operation
using ScyllaDB manager you can schedule tasks such as backup and repairs, check cluster status and more
some tasks will be created automatically when you register a ScyllaDB cluster in ScyllaDB manager
ScyllaDB manager can manage multiple ScyllaDB clusters and run cluster-wide tasks in a controlled and
a predictable way, it consists of three components the server, which is a daemon that exposes
REST API, sctool which is a command line interface for interacting with the server and the agent
a daemon installed on each ScyllaDB node and the server communicates with the agent over HTTPS
the server persists its data to a ScyllaDB cluster which can run locally or can run on an external
cluster, in the diagram before you, you can see the ScyllaDB manager with a remote backend datastore
managing multiple ScyllaDB clusters, each node has two connections with the ScyllaDB manager server
the REST API connection used for ScyllaDB manager and ScyllaDB manager agent activities
such as backups and the CQL connection used for the ScyllaDB CQL health check
here you can see a comparison between nodetool repair and ScyllaDB manager
nodetool is a local operation versus the manager repair which is global, it will make sure not to
run a repair more than once, ScyllaDB manager server can be deployed in a highly available mode
whereas the nodetool repair is a fire and forget operation, the manager repair splits the token
range into small ranges, each range limited to one shard and runs them sequentially, once
repair is stopped, it can be resumed for the last range, if a range repair failed
ScyllaDB manager would retry to repair it at the end of the repair operation, you cannot simply
run nodetool repair and get the same results as you do with the manager, the part about splitting and
optimizing the token ranges is the most important part and it’s something nodetool cannot do
let’s see the repair jobs granularity, so we support multiple dc repair, you may specify
a data center or data centers you want to repair using global patterns, those are
like regular expressions but much simpler, the dc flag accepts a comma-separated list of patterns
it’s compatible with ordinary enumeration and also supports exclusion with exclamation mark prefix
similar for key space and tables, key space pattern is separated from table
pattern by a dot, this gives much flexibility in selecting what shall be repaired or not
it’s up to you, the expressions are evaluated on runtime, so when a key space or table is added
it will be automatically picked up, tasks can also be stopped and updated to be resumed later on
now the repair intensity, the repair intensity is a very important feature in ScyllaDB manager
the intensity specifies how many token ranges per shard can be repaired in a ScyllaDB node
at every given time, for ScyllaDB cluster that does not support row level repair, which is ScyllaDB Enterprise
2019 and earlier, intensity can also be a decimal between 0 and 1, in that case
it specifies percent of shards that can be repaired in parallel on a repair master node
the default intensity is ‘1’, you can change that using the sctool repair intensity flag
ScyllaDB manager 2.2 adds support for intensity value ‘0’. in that case the number of token ranges is
calculated based on node memory and adjusted to the ScyllaDB maximal number of ranges that
can be repaired in parallel, if you want the repair to run faster try using intensity 0
now, repair speed is controlled by two parameters, the parallel and the intensity
flags, those parameters can be set when you are scheduling a new repair with sctool repair
when you’re updating a repair specification with sctool repair update, or
when you’re updating a running repair task using sctool repair control, by default ScyllaDB manager runs
repairs with full parallelism, the way to make repairs faster is by increasing the intensity
note, that the less the cluster is loaded the more it makes sense to increase the intensity
if you increase intensity on the loaded cluster it may not give speed benefit since the cluster has
no resources to process more repairs, in our experiment in a 50% loaded cluster, increasing
the intensity from 1 to 2 gives about 10 to 20 percent boost and increasing it further have
little impact, if the cluster is idle try setting intensity to 0 for maximum intensity, if the cluster
is running under substantial load, try setting the intensity to 2 and then increase by 1, if needed