Join us at ScyllaDB Labs, instructor-led, hands-on, training sessions | July 17
Register now

Using Spark with ScyllaDB

Using Spark with ScyllaDB

Whether you use on-premise hardware or cloud-based infrastructure, ScyllaDB is a solution that offers performance, scalability, and durability to your data.

Granted, data is stored in a columnar, table-like format that is efficient for transactional workloads. In many cases, we see ScyllaDB used for OLTP workloads.

But what about analytics workload?

By using Spark together with ScyllaDB, users can deploy analytics workloads on the information stored in the transactional system.

This lesson will cover:

  • An overview of ScyllaDB, Spark, and how they can work together. ScyllaDB and Analytics workloads 
  • ScyllaDB token architecture, data distribution, hashing and nodes
  • Spark intro: the driver program, RDDs, and data distribution 
  • Considerations for writing and reading data using Spark and ScyllaDB
  • What happens when writing data and what are the different configurable variables?
  • How is data read from ScyllaDB using Spark?
  • Should Spark be collocated with ScyllaDB? 
  • What are some best practices and considerations for configuring Spark to work with ScyllaDB? 

You can read more about Hooking up Spark and ScyllaDB in this four-part blog series. This blog post covers a real-world use case. The documentation demonstrates a simple ScyllaDB-Spark integration example.