Kafka and ScyllaDB

S110: The Mutant Monitoring System (MMS) and Integrations Kafka and ScyllaDB

This lesson goes over an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

Ingest data from a multitude of different systems, such as databases, your services, or other software applications
Store them for future reads
Process and transform the incoming streams in real-time
Consume the stored data stream

Some common use cases for Kafka are:

Message broker (similar to RabbitMQ and others)
“Glue” between different services in your system
Replication of data between databases/services
Real-time analysis of data (e.g., for fraud detection)

The ScyllaDB Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into ScyllaDB. It supports different data formats (Avro, JSON).
It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The lesson also provides a brief overview of CDC. To learn more about CDC, check out this lesson.

The ScyllaDB CDC Source Connector is a Kafka Connect connector that reads messages from a ScyllaDB table (with ScyllaDB CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. ScyllaDB CDC Source Connector has at-least-once semantics.

The lesson includes demos and a hands-on lab. You’ll learn how to quickly start Kafka, using the ScyllaDB Sink Connector, view changes on a table with CDC enabled, and download, install, configure, and use the ScyllaDB CDC Source Connector.

Also, check out the documentation, the blog post Introducing the Kafka ScyllaDB Connector, the ScyllaDB Sink Connector GitHub project, and the ScyllaDB CDC Source Connector GitHub project.