More practical migration strategies, including using a shadow cluster, and more on TTL and Counter tables.
We finally got to
start discussing more complex and complicated scenarios here.
And the first one. I would like to highlight is how
a messaging application company had a challenge to migrate over
1 trillion rows from Cassandra to ScyllaDB, not sure
if you guys have ever heard about that story.
Now the good news is
that Cassandra and ScyllaDB are API compatible,
so there is very little needed for any schema
or application changes, in order for you to migrate.
On top of that
given the criticality of their data
and consistency requirements,
it was mandatory that the migration
would be completed in an online way with zero user impact
and no data loss.
In order to do that they created what we call
a “shadow cluster”, which is a cluster that receives
both reads and writes from the application,
and it’s really critical to assess the performance impact
of the new platform before you actually switch.
A shadow
cluster allows you to test other aspects of your migration,
such as whether your application will work seamlessly,
whether the new solution is going to be stable
and reliable for a long period of time, and other things.
Of course there are good things, but there are also drawbacks,
and the most important one is that it’s fairly expensive
because you will typically
double your infrastructure costs for some time,
plus the fact that it’s not a commonly used strategy
for migrations, which may take a considerable amount of time.
Not to mention all the additional nuances
you will introduce in order to ensure everything runs smoothly.
Alright.
Now, one of the most important migrations we’ve learned
from this one is how important it is to ensure the source
assisting stability during the actual migration, okay?
Most people typically
just want to migrate their data as fast as possible,
but when you are working on a real time type of system
where your latencies
are critical for your end user satisfaction, it may happen
that migrating as fast as possible is not a good idea.
The solution here
is that we want to migrate the data as fast as possible,
but up to a point where our existing source system
and latencies are still acceptable.
And, how many operations per second should I run to migrate?
At which level of concurrency?
There is no easy answer to that, which is why you have to test.
All right.
Another challenge that we have with this level of migration
comes down to when you are using complex data
types such as UDTs and collections.
One of the
problems of those data types is the fact that,
although in each pattern, you can store
a large amount of items and given that the CQL protocol
doesn’t allow us to query the time to leave
or the write timestamps
of every individual item inside a collection.
Even if it was allowed
iterating throughout the individual items
of a collection,
it could potentially take too long, thus
causing the migration to take indefinitely longer.
However, there is one trick that we can do to bypass this
limitation, which is the fact that the protocol allows one
to manipulate the timestamp and TTL of every single write.
In other words, to prevent the data
from being overwritten, we had coded the write of every
collection and UDT to a point in time in the past,
before the actual migration actually got fired.
As a result, newer writes coming from the main application
would never get overwritten as their timestamps
would always be higher than the migration job was.
Lubos, who is going to talk right
after me,
is the one who actually implemented
the entire logic in the Scylla migrator.
Okay, so, one
trillion messages.
That’s a lot of data,
so how do we actually migrate
and ensure that we can resume the job
if something goes wrong, such as our migration job fails?
Right.
We definitely cannot start from scratch.
Then what you have to understand here is that
as data in Cassandra, in ScyllaDB is distributed
in a token reading fashion, all we have to do is to scan
through all the tokens that exist in the ring.
And after every successful scan
we can record the progress somewhere to avoid
having to scan that specific token range again.
Then, as we have understood from the previous slide,
if you are dealing with complex data types,
you can simply hard code your target writes to a time prior
to when the migration actually starts, plus some grace period.
And finally,
you don’t have to migrate your data in a serial way.
So scanning only one token range at a time.
Cassandra and ScyllaDB makes it very easy for you
to scan several chunks of data concurrently and parallelize
your work.
But as we have covered earlier on, be very careful
with your source system latencies.
All right.
In the picture, on the lower left hand side,
we can visualize an example of a token ring.
And on the right side, we can see how to actually
increase your migration parallelism using Apache Spark
as an example, and you would simply add
more workers to achieve a higher concurrency.
All right, so now we are going to see a very tricky one,
counter tables.
And really, guys, there is nothing good
that I can think of about migrating counter table.
So if you guys have counter tables to migrate
I don’t have any tips for you other than good luck.
And the reason is that counters are extremely complex
to migrate
and part of the reason is due to how this data type works
under the hood.
So here’s a lesson learned from the field: know your data types
and how your database processes them under the hood.
If you are working with a counter table migration,
then you have two choices: consider
introducing a small offline window for your application,
which afterwards is going to introduce
an outage of a few minutes; or two, split your writes
to a different source of truth
and aggregate during the length of your migration.
All right, so what’s special about counters?
Well, they are a data type that’s known as a “conflict-free
replicated data type”, which means that concurrent
updates eventually converged to a stable value.
In other words,
counters only support increment and decrement operations.
And as these mathematical operations
always converge to the same value, you cannot
initialize a counter down to a specific value.
You also cannot do a dual-write because it would be impossible
to know how much
to increment or decrement after your data got migrated.
Lastly, depending on the Cassandra version,
your counter tables were created, you simply cannot
migrate them at all in an offline way
because you introduce the risk
of corrupting your destination cluster.
And so as you guys can see, it’s pretty scary, right?
Now, it’s not that migrating counter tables is impossible,
and there are a few routes you could take.
In last year’s. ScyllaSummit, “happn” explained
how they migrated 68 billion records to ScyllaDB.
In their talk further expands on why counter tables are evil.
So I definitely recommend you to watch
Robert’s talk if you are interested
in learning more about their approach.
But for our actual migration, we decided to take
another route, one involving a short downtime.
And here is how it works.
First, the client writes and reads
only to the source cluster.
Next, really quickly, we stopped the application traffic.
And as the traffic stopped, we took a snapshot of our tables.
We then
reinstated the traffic, but as we did so,
we configured our client,
or application client, to now write data to what
we call a delta table to the target ScyllaDB cluster.
With the client operations now back online, we then transferred
the previously taken snapshots to the destination cluster
and restored them to a final table.
Note here that we have two different tables: the Delta one,
which is receiving updates from the client application
at real time,
and the final table,
which is receiving the static values
which you have previously snapshot.
All right.
After the snapshot got fully restored,
we updated the clients to stop ingesting data
to the Delta table
and instead start ingesting data to the final table.
As a result, the
Delta table will no longer receive updates
and our updates will be done to the final table only.
So what happens at this point is that,
it means that our Delta table should have all the counter
values which were updated during the entire migration process
from when we moved the snapshots and restored them.
So of course the final step for us is to feed
the Delta with, we feed the final table
with the values from the Delta table like this.
And we essentially accomplished this by sideloading it.
Now, is this method perfect
and not prone to failures?
Definitely not.
If you make a mistake, you need to start the process
from step zero.. But in our case,
and with planning, it worked really nicely
and it’s an alternative to the method proposed by happn.
Okay.
And of course, after we sideloaded everything,
we simply stopped writing to the source cluster
and migration is done.
So this is the
actual migration diagram
explaining the same previous flow, but in a different way.
And thanks to Lubos because he’s the one who came up with this
migration strategy.