Covers specific topics related to migration from DynamoDB to ScyllaDB. Should dual-write be used? What about RCUs? How about migrating large tables?
Let’s now talk about DynamoDB
to ScyllaDB cloud, how it actually works
and what you need to be aware of.
So, here similarly,
as in the Apache. Cassandra migration,
we also have the concern of dual-writing.
Should we do it or not?
And some other concerns actually may drill down to costs
such as, for example, how many RCUs, read capacity
units, will the migrator consume while it is
scanning my source table?
And we all know that given that DynamoDB
is a pay per operations type of service model,
if we do not control the RCUs, things can get fairly expensive.
And how do we migrate large tables?
Are also some of the questions that we typically get.
All right, so let’s start with dual-writing.
One of the things that
I want you to keep in mind is that the DynamoDB
protocol doesn’t have a concept of “last-write-wins”, as
we have in the Cassandra protocols.
Therefore, the last write arriving to the database
will always be the one persisted.
In that case the answer of Leandro, if
we were doing a DynamoDB
migration would be what would happen over here.
Okay.
So the data the migrator would overwrite
the data,
as opposed to actually storing the correct data.
And of course this is not what we want.
So we should not do a write when migrating from DynamoDB
and instead we need to rely on the usual
migration steps as we have seen during the migration overview.
So a DynamoDB to ScyllaDB cloud migration
therefore works as follows; and guys pay attention
to this flow and you’re going to say that it is very
much as we have seen in the overview section.
So we first start with our application writing
and reading directly from DynamoDB, and then
we enable DynamoDB Streams
in our table, notice that
if you are using the ScyllaDB migrator this step is optional
as they migrator itself can enable it for you
automatically if you so want.
Then after we enable DynamoDB
Streams we can actually initiate
the actual migration job using the migrator.
And what the migrator
is going to do
is it’s essentially going to do a table scan
and for every record it reads, it will write
to your destination in ScyllaDB. Cloud cluster.
All right.
And after the migration finishes,
and if you configure the migrator to do so,
it will consume the data from DynamoDB Streams
and then sync those changes against ScyllaDB Cloud.
To this point, you must wait for those changes to get in sync.
And then finally
we are going
to switch the client application to now
consume, read and write from ScyllaDB.
All right.
So that’s essentially the flow.
It’s pretty much simple and it is
also the
the same flow, but tailored for DynamoDB
as you have seen in the overview section.
Jay is asking: do you support the DynamoDB Global Tables?
Jay, very good question.
So we do not support the API for global tables specifically,
but global tables, is essentially a table
which is spread over multiple regions.
Right.
That we support. So
if you have a multi-regional ScyllaDB cluster
and you create a table by default
when you are using the DynamoDB API,
this table is going to be created automatically as global.
So your application factor from a Cassandra perspective
is going to be configured as tree for each
region that
your that spans your ScyllaDB deployment.
If you do not want your table to be global,
then you can manually go log in via the CQL protocol and simply
alter the replication factor of the key space.
Okay, so
the answer is
yes, global tables can be made to work on ScyllaDB.
But no, the API for global tables are not yet supported.
All right, let’s move on.
So the migrator has specific
settings to control the concurrency
while it scans the source. DynamoDB table
and we control the concurrency
by using this “readThroughput” parameter
and by tuning that parameter or setting it to zero.
It allows you to either speed up or slow down
your migration task, at the expense of,
you know, either more or less RCUs being consumed
from your source DynamoDB table.
Some important remarks that that’s really important for
you guys to keep in mind here is that I always remember that
and this is not really Dynamo specific, but it’s particularly
important for a Dynamo is always remember that
the larger
your source table is, the more data that it migrates,
of course, the longer it will take for the migrator
to actually scan throughout your data and write it down to
the destination cluster.
And DynamoDB specifically,
this is important because first,. AWS by default only guarantees
that the data in DynamoDB
Streams will be available for 24 hours for consumption.
So if your table potentially is terabytes
in size, it might be that
you may lose updates in DynamoDB. Streams.
And the larger your dataset is, of course, the more RCUs
you will consume in order to get your data transferred.
So as a result of this, should the migration
either become cost prohibitive for you of your datasets
to move off. DynamoDB is too large
and would spend more than 24 hours.
Then you may consider alternative methods
such as,
you know,
doing a point in time back-up of your table to history
and then you can restore it from that backup
directly to your ScyllaDB cluster.And then you can
send your
DynamoDB Streams to a Kafka topic or whatever,
and then consume from it after you are done
with the initial forklift of your data.