Spark - ScyllaDB Migrator - ScyllaDB University

5 Min to complete

S110: The Mutant Monitoring System (MMS) and Integrations Using Spark with ScyllaDB Spark – ScyllaDB Migrator

The ScyllaDB Migrator project, what is it, and how it works.
The Migrator allows for copying all rows from source to target DB (ScyllaDB, Cassandra, DynamoDB, Parquet). Additionally, it enables validation of transferred data.

Transcript

I promised you to talk about ScyllaDB. Migrator, so let’s have a slide on that as well
we had a task basically that
there were various sources and we wanted to be able to have a pluggable system to go and take
the data from any basically source database that is supported by Spark, because Spark besides that
it has connector for ScyllaDB it can have connector for anything else as well like, it can have a
connector for PostgresQL, DynamoDB connector it has a Parquet connector by default I think and all
these connectors now you can very easily plug them into ScyllaDB Migrator, ScyllaDB Migrator at the moment supports only
those four databases as a source that are mentioned there so ScyllaDB, Cassandra, Dynamo and Parquet
and it basically will copy all rows from a single table from the source database to target database
and it can even preserve the timestamps and. TTL so I think this basically is very good and you
can implement your migration strategy with this in a very good way you basically can have a dual
writes and you can write to source and target database at the same time, start the migration
with the preserving of timestamps and TTLs and you are sure that basically the Migrator when it
writes the data, it will not overwrite any new data from the new writes because simply the
timestamps and TTLs are registered. Like they are preserved and no surprises will happen
and you you will know that all your data is correctly migrated and correctly moved
this ScyllaDB Migrator project once it migrates your data you can validate it as well so there is a
special application so it’s like two applications in one and this special application for validation
is part of it and you can just run it after you’ve done the migration and it will do a comparison
you can even correct or have some correction of the timestamps so we don’t get false positives
and in the end you will get a number of rows or actually a list of the rows that are different
and you can see whether it’s an okay or divergent or it’s it’s something that you need to look at
and you need to basically fix some of the rows that they were not transferred as you wanted
the repository hosts the code unfortunately we need to build it, we don’t offer it as a
download anywhere I hope that we will do releases we plan to do it so later on you might even get
like a download link, so far we are happy to help you so if you are not able to build it
just talk to us and we will be able to either give you a build or just show you how to build
it, for the people who know Travis the Migrator repository is using Travis so you can look at
how Travis is building it and this basically gives you like a blueprint or a recipe how you can build
it yourself as i mentioned the Migrator itself is a proof of concept for a full scan so you
will be able to see how the source data frame is created how we are setting all the options that
I mentioned the preservation of the timestamps and then how this data frame is then taken and
basically written to a target data frame that you want to use. The general use case is basically
people are migrating for example from. Cassandra to ScyllaDB or from DynamoDB to ScyllaDB
like a generic thing and you can very easily see how those endpoints are
created and how they are interconnected and how the data is flowing between them

Quizzes

Using Spark with ScyllaDB Performance and Summary

Previous Quiz

Back to Lesson

Next Quiz

Spark – ScyllaDB Migrator

5 Min to complete

About

Resources

Documentation

Keep in Touch