Tips to make the migration faster and more straightforward. Also covers how to validate the migration was successful, using the ScyllaDB Monitoring stack.
Let’s talk about tuning. So
the first rule of a dump is that you should never, ever
run the migration on the nodes themselves,
like I’ve seen it happening
so many times, like people coming to us
and complaining “why
is this migration so slow?”. And yet you started it on your
either source database node or on your target database node.
And even the migration needs resources, so
if you don’t have a decent enough machine
for your migration, then it will be slow.
If you run the migration on the source
or target database, then it will steal the resources
from the database itself.. It will be slow.
So just keep this in mind and never ever run this on
something that is not dedicated to the migration itself.
Another tip is to make the infrastructure layout 1:1. I
mentioned
that already, so this is just a reminder.
I mentioned that you shouldn’t move the data around
too many times and you shouldn’t
convert the data if possible.
Another tuning that is actually interesting,
but it might backfire is that if you have enough
disk space on the target,
you might want to disable compactions on the target
and this will give you,
or this little trick will give you very fast writes,
but at the same time your disk space usage
will become very high at a certain point
and it will not fit and you will just overload your disks.
Then of course it will be bad.
But if you try or if you, for example,
migrate and the estimate, as I know, 50%,
70% or 80% more disk space, then you probably might get away.
Especially if you use,
for example, ICS later on that won’t need as much space
to get the compactions done.
And another tip to make this fast is you need to size
your Spark properly.
Like, usually the Spark cluster, or my rule of thumb is to size
the Spark cluster
to the smallest one of the source/target clusters.
And I give it the same or exact same amount of the CPU’s and
same amount of the memory.
But I don’t really care about the disks because
Spark doesn’t really need disks, so you can just put there
some cheap disks or don’t use disks at all,
or just have the OS disk.
The tricky part is that if you, for example, break
your database to smaller pieces with Spark, you definitely need
to keep in mind that the driver will need a lot of memory to
remember all the pieces, so you need to give them a
little driver too.
Otherwise,. I’ve seen too many out of memory
issues with the driver not having enough memory.
Let’s talk about how we can monitor and validate
the migration. So,
I’m not sure if you are familiar with the Scylla monitoring,
but there is a very good Scylla monitoring and, like
without it, it’s like going blind.
So if you don’t know about it, check it out, learn about it,
and there are some tips that can help you.
So always have a look in the shard’s view
because basically your smallest computing unit is a CPU.
And if this basically is bottlenecked or if
just a few CPUs are running,
then there is a problem with your workload.
And you basically either
didn’t shuffle it enough or you just basically
don’t have enough parallelism,
like you have too big Spark tasks,
or you can just see basically that the compaction
slowed down your ingestion and you can act
based on the monitoring and you know what is going on.
I prepared some of the slides here.
I will go quickly over them.
So basically,
I like the details dashboard, like in detail dashboard
you can very easily see the load.
This is the top left one and you can very easily
see the distribution of the queries
on the coordinator, which is this middle part.
And then you see how they are basically split onto replicas.
And you can then very easily spot
what John was explaining
regarding the partition keys and clustering keys
and if they are not properly distributed.
Then the next thing I look at is basically detailed and cache
because of course if you’re basically doing reads
and you cache them, is faster than going through the disk so
this is an obvious thing.
And then the last thing
is basically to look at the latencies.
And both of the
view of the distribution between coordinator
and replicas and here
you can very easily spot if you have some old partitions
or just shards that just worked too hard and other shards
that are slacking.
And if there are shards that are not
doing anything, then yes, your migration definitely,
or your migration method definitely is not effectively
using all the CPUs and all the distribution of the data.
In the Scylla migrator there is basically
a way how we can monitor the tasks as well.
So you can see how many tasks are done,
how many tasks still need to be done.
And I mentioned that there is this “savepointing”,
I think every 5 minutes or something like that.
The migrator is
dumping out the safe points.
And basically
if you remember the token ranges. Felipe was showing,
it’s just basically
saving all the token ranges that it already processed.
And then you can basically know that you just need to process
only this amount of token ranges and just focus on those
and don’t repeat the ones that are already migrated.
This is not the Kibana dashboard,
this is a dashboard of the Spark.
So this is Spark view.
Spark job is server.
Okay, so we are getting
to the end.
Once you’re done with the migration,
then usually the first question is, how do I validate my data?
And there are a few methods on how you can do it.
And it depends like, how much do you value your data?
You can do a full scan.
Lots of people just go and start comparing
source and target row by row, and this is fully okay.
And we have or we can help you with that if you have the Spark
because the Scylla Spark migrator
it has a validator built in
and you can tune it, set it up, you can even adjust
the timestamps.
Like if there is a small clock
skew within the clusters, you can
set it up in the validator and it won’t give
you false positives.
At the same time, this is
like a full scan from both clusters,
so you need to read both of them
and it will like
create some load on both source and target cluster.
What I’ve seen people doing
as well is that basically their client,
and in their client
they decided to read from both databases
and they were doing already dual-writes.
So I think the dual-read wouldn’t be that big of a hassle
and they even had
or you can even have like a certain threshold of sampling.
So basically
you can just basically say that only 10% of your queries
will go and read from the target database
and compare it with the same read from the source database.
In this way you won’t be stressing the cluster
so much as with a full scan,
but you will just basically check on your sample.
And if in this sample you don’t see any issues
or problems, then you can be to a certain degree
safe and you
can say that, okay, with a certain percentage
you are okay or not okay.
And at the same time
what I’ve seen also people doing or we can also do is
you can just write your own separate verification client.
If the database or the table logic is like predictable
and you can
validate it
separately, you can just validate the target database
without actually consulting the source database.
So I’ve seen this happening as well.