What are some basic settings that have to be configured before getting started with a migration? The lesson also covers basic tools such as the Scylla Manager backup and restore.
How to make
the data fit.
So sometimes, and this comes back to the question
or to the fact that you need to think
about how many times you transfer a replica.
And in an ideal case,
when you, for example, want to do a clone of your cluster,
what matters is your data being spread out.
And you basically, if you set the topology
and the replication factor of source and target cluster,
which sometimes can be done,
then basically you can basically squeeze
the copying of the data from one side to the other
to the duration of copying from
the data that is living on a single node.
And you can just
basically run in parallel copying of all the other nodes
because the topology matches
and you can just load the data like that.
And in this way this migration will be the fastest.
It might not be always achievable, but if you get close
to the same topology, then it will definitely help.
It’s for example, harder when you are moving
from one cluster that has, I don’t know,
nine nodes and replication factor of two
and you want to move it to a cluster
that’s three nodes and replication factor of three.
At the same time,
those multiples and those simplified scenarios also help.
So for example, cluster
with three nodes and application factor of three,
it means that each node contains each replica and this way
you know exactly from the source cluster
which nodes
you can just put together and just dump them to this one node
and make it easy.
So those are like small things
that can simplify the whole migration for you.
What we, however, shouldn’t forget, and I think
this was already mentioned in the chat, that
you should definitely be paying attention
to your tombstones because basically those dual-writes
that Felipe explained,
they are also valid for the dual-deletes.
So if you delete something on the left side
and you delete it on the right side,
those deletes have to stay.
But for example, if your migration is taking
longer than the repair period or actually during the migration
you shouldn’t run repairs, then you need to consider
that there might be tombstones and they might get expired
and you don’t want them to get expired.
So, please keep this in mind
and just increase
the gc grace period,
until the duration of the
migration is done
and then run your repair and you can live happily ever after.
Also, the most
common mistake or the problem that I’ve seen in the clusters
is that you should estimate the impact of the methods.
Some of the methods they can be less intrusive,
some of the methods can be more intrusive.
And if only your source cluster
have certain latency requirements,
you should make sure that you don’t kill
the latencies on the source cluster.
And another important thing that you often forget
is that on the target cluster never have automated ways
that will increase the number of writes for you
like indexes/materialized views created before you switch.
So you should basically create those MVs or indexes
only after the whole data is migrated
and then let them create and let them waste the resources
that they are actually free for the index
or materialized view creation
rather than before which basically they are like
loaded and fully busy with migration itself.
Let’s talk about the tools,
so I will just go quickly over them.
So the first, and important, tool is the nodetool refresh.
If you align your data properly, this can be the fastest way
of how to get your data over.
It also comes with disadvantage
that you need full access to the source cluster as one thing.
And second thing is
sometimes when the topology doesn’t match it,
like if you have, for example,
nine nodes in source and six nodes in target,
then it’s hard to figure out which nodes should belong where.
And if you have to use the load and stream, it might be slow.
Like despite we tried to make it very fast
and actually it’s faster than copying everything
everywhere, which is the default approach of nodetool refresh.
It will still basically take some resources and
it will be slower than having the data aligned.
So if the data fits, it can be fast.
If the data doesn’t fit, load and stream will be fast.
And if you are completely out of questions, then just copy
everything everywhere and let it clean up.
So that’s like the poor man solution to the nodetool
refresh.
We have added or. Scylla has added
in the Scylla Manager an option to back up and restore,
and this is actually very different
between the two versions of the Scylla Manager.
Like before, we had Scylla Manager 3.1.
It didn’t support the load and stream, so there, basically,
it was only able to go and restore based on the topology.
So the target cluster had to be the same.
And with Scylla 3.1, these leverages now
the load and stream and it basically will do
any topology to any topology, so just use that
and it makes life very simple and very easy.
And it’s also very convenient because
before the Scylla Manager 3.1, if you wanted to restore
just tables, you actually had to go
and manually copy the SSTables from the appropriate table.
So this is not the case anymore.