Spark - Hands on Intro - ScyllaDB University

5 Min to complete

S110: The Mutant Monitoring System (MMS) and Integrations Using Spark with ScyllaDB Spark – Hands on Intro

Two sample applications: Sample table enrichment (ETL) with ScyllaDB and ScyllaDB Migrator (to migrate from various DBs to ScyllaDB)

Transcript

I have prepared for you a bonus, I don’t know if you’ll be able to get to the demo but I’m pretty
sure that all of you guys will be able to go and follow the read me and follow the read me steps
in the application and you can have a look yourself I think it won’t take more than an
hour like, when I tried to go and repeat the demo. I put there it was like I don’t know 40 minutes
an hour if you will have to install everything but. I think most of the components that will be in the
application that is there in the repository you will likely have them already installed on your
stations like I don’t know Docker most of the developers have Docker installed and downloading
and setting up Java is also common to most of the developers and the rest is really just a
few minutes, so there is an application and I put there on purpose and application with Spark 3.0
so this was on purpose so you can have a look at how this is different because with the Spark 3.0
they changed a little bit the approach to data frames and they are not using the data frames anymore
they are using they are not even using data sources but they call it datasets now
and those datasets basically should shield you again even more from all this partitioning talk
that I mentioned and it should basically give you by default the appropriate mapping
and appropriate access to all the token ranges from ScyllaDB directly, so that’s why I wanted to
show you that and that’s why I employed there the latest Spark 3.0 and latest ScyllaDB 4.4
and the application is very simple the application itself just goes and it finds a table in ScyllaDB
it reads the table to the Spark and then in. Spark for every single row it will validate
whether the row has a populated column for number of letters in the first and last name
and if it detects that it doesn’t have it then it will just go count it and then it will just
update back the ScyllaDB database with the updated count and it will kind of
enrich the table itself, so this is what you can try out as I said if you will have Docker and
Java installed the example itself will be at 10 minutes 15 minutes not more for you to set up
and then the most (important) application is what I mentioned already and it’s a ScyllaDB Migrator
the repository is there I will be planning to or I’m planning to release I hope by tomorrow
a repository with the sample scripts to set up the ScyllaDB Migrator in the same way as the
first application so just watch some Slack spaces or what some I will try to put on user’s
for example so watch the user’s mailing list for ScyllaDB and I’ll send the mail there
with an example repository for ScyllaDB Migrator demo so then everybody can do it and it will be
in a very similar manner as the enrichment demo so. I would expect that if you will have Docker and if
you have Java installed it will just take like 10 minutes to set up and play with the Migrator
because I know that Migrator is not that straightforward to set up
yes but it’s not also that hard and if you just keep in mind some
steps and those will be documented in the repository then you can go and play with
it as well and it will be very easy for you to go and use it for your migration
or just for playing with how the Spark is accessing the
data sets from ScyllaDB or from other source databases

Previous Quiz

Back to Lesson

Next Topic

Spark – Hands on Intro

5 Min to complete

About

Resources

Documentation

Keep in Touch