A deep diver into using ScyllaDB Manager to run repairs. Covers the ScyllaDB Manager agent, benefits of using ScyllaDB Manager for repairs, how the backup process works, and some examples.
Let’s move on and talk about cluster backup using ScyllaDB manager
but first, let’s talk briefly about the manager agent which is an important part
of the backup functionality, the manager agent is a daemon that needs to be installed and started
on each ScyllaDB node, ScyllaDB manager server communicates with nodes in
the managed ScyllaDB clusters via the agents, the communication is encrypted using HTTPS
and protected by an auth token, agent serves as a reverse proxy to ScyllaDB REST API
and provides additional features specific to ScyllaDB manager, such as backup using sctool
you can backup and restore your managed ScyllaDB clusters under ScyllaDB manager, backup tasks are
scheduled in the same manner as repair, you can start, stop and track backup operations on demand
ScyllaDB manager can backup to amazon S3, S3 compatible API storage providers, such as Ceph
MinIO, Commvault and we recently added support for Google Cloud Storage as a backup location
the backup can work with MinIO and other. AWS S3 compatible providers such as Alibaba
Ceph, Digitalocean, IBM cos, etc, to configure S3 with a third-party provider
in addition to the credentials, one needs to specify provider parameter
with one of the above options, if the service is self-hosted it’s also needed to specify
endpoint with its base url address, all this is done in the agent configuration file
now there’s also an option to set CPU pinning for the ScyllaDB manager agent, in the agent configuration
file you can set the CPU setting, which dictates the CPU to run the ScyllaDB manager agent on
by default the agent reads the ScyllaDB configuration from /etc/scylla-manager-agent/scylla-manager-agent.yaml?
and tries to find a core that is not used by ScyllaDB, if that’s not possible you can
specify the core on which to run the ScyllaDB manager agent, the default would be core zero
ScyllaDB manager automates the backup process and allows you to configure
how and when backups occur, the advantages of using. ScyllaDB manager for backups are the following
you decide if you want to backup a single table or an entire cluster, the choice is up to you
deduplication, the manager prevents multiple uploads of the same SSTable, that allows
you to save on network and S3 costs, data retention, it will purge all data automatically
when all goes right or failover when something goes wrong, throttling capabilities, control how
fast you upload or pause/resume the backup upload it will continue where it left off, you can set
configurable upload destination per data center again it will provide savings on network costs
the ScyllaDB manager has retry mechanism in case of errors and finally, visibility
everything is managed from one place, progress can be read using the CLI, REST API or Prometheus
metrics, you can dig into details and get to know progress of individual tables and nodes
the backup procedure consists of multiple steps executed sequentially, it runs parallel
on the nodes unless you limit it with the snapshot parallel or upload parallel flag
the steps are first of all, there is a snapshot we take a snapshot of the data on each of the nodes
according to the backup configuration settings the second phase is optional, the schema, we upload
the schema CQL to the backup storage destinations this requires that you added the cluster in ScyllaDB
manager with the username and password flags in order for the ScyllaDB manager agent to connect to CQL
and copy the schema, then we upload the snapshots to the backup storage destination
and we upload the manifest file that contains the metadata about the backup, the last step is purging
if the retention threshold has been reached we remove the oldest backup from the storage location
there is some prep and setup work to be done before you can create a backup task
first you need to create a storage location for the backup, currently ScyllaDB manager supports amazon
S3 buckets and google Cloud storage buckets you can use a bucket that you already created
we recommend using a bucket in the same region where your nodes are, to minimize cross region
data transfer costs, in a multi-dc deployment you should create a bucket per data center, each
located in the data center region, choose how you want to configure access to your buckets, you can
use AIM role for S3 or auto service account authorization for Google Cloud Storage, these are
the recommended options, another option is to add your credentials to the agent configuration file
this method is less secure, as you will be propagating each node with this security
information and in cases where you need to change the key you will have to replace it on each node
once you’ve set up the storage location and the access control to the bucket you need to validate
that the manager has access to the backup location as you can see in the commands in the slides
if you get no response, it means that all is good if there is a problem with accessing that location
you will see an error, there is also an option to use the debug flag for further investigation
after you’ve done the prep work now you can create a schedule backup, let’s see which flags
we have, ‘-c’ – it is the cluster name you use when you registered the cluster in ScyllaDB manager
the ‘L’ – flag points to the backup storage location in either S3 or S3 bucket
name format or your data center name and then the S3 and your S3 bucket name, if you want to
specify a location for a specific data center for example as we mentioned earlier, when you
have a multi-data center deployment, we recommend having a bucket for each data center separately
the flag ‘-s’ – is the time you want the backup to begin and ‘-i’ – is the time interval you
want to use in between the consecutive backups the command series that
you see in the slide will schedule a backup on. December 9th 2019 at 3:15:06 UTC time zone, and
will repeat the backup every day, keeping the last seven days every, week keeping the previous week
and every month, keeping the previous month, all the data will be stored in S3 under the
‘my backup bucket’, each command returns a task ID, this ID can be used to query the status of the backup task
to defer the task to another time or to cancel it completely, if you want to run a backup only once
leave out the interval argument the ‘-i’, in case you want the backup to start immediately
but you still want to schedule it, to repeat at the determinant interval, leave out the ‘-s’ start
flag but do set the interval flag the ‘-1’ to the time you want the backup to reoccur
you can backup one specific data center using the ‘dc’ flag
or you can specify more than one data center using global patterns to match multiple DCs or exclude
them, if your data centers are located in different regions, you can also specify different locations
creating the bucket in the same regions as your data centers will save some bandwidth costs, we
recommend to use the ‘dry run’ parameter prior to scheduling backup, it’s a very useful way to
verify whether all necessary prerequisites are fulfilled, the dry run verifies if nodes
are able to access the backup location provided, if it’s not accessible an
error message will be displayed and the backup is not scheduled, the dry run gives
you the chance to resolve all configuration or access issues before executing the actual
backup, if the dry run completes successfully a summary of the backup will be displayed