This lesson goes over an intro to Kafka and covers some basic concepts.
Apache Kafka is a source-available distributed event streaming system. It allows you to:
Ingest data from a multitude of different systems, such as databases, your services, or other software applications
Store them for future reads
Process and transform the incoming streams in real-time
Consume the stored data stream
Some common use cases for Kafka are:
Message broker (similar to RabbitMQ and others)
“Glue” between different services in your system
Replication of data between databases/services
Real-time analysis of data (e.g. for fraud detection)
Transcript
Let’s start with Kafka, the main technology we will cover today
Apache Kafka is a source-available distributed event streaming system
so that’s a mouthful let’s unpack these words, an event streaming system
allows you to ingest data from different sources for example from databases, your microservices
client applications and so on.. After this data is ingested to the
event streaming system the data is stored for some time, you can configure the time to leave
policy for example so that all messages will be deleted within one day or maybe one week
As the data comes in you can process it and transform it in real-time
get some powerful insights while doing it and also you can consume the stream using Connectors
client libraries and other applications
so what would be the common use cases of Kafka?. First of all Kafka, you can think of Kafka as a
standard message broker similar to RabbitMQ and other similar systems. You can use Kafka to glue
between different services in your system, that way you can scale those two systems independently
and have a buffer of data of messages that are transferred between those two systems
you can replicate two databases with. Kafka, so transfer data from one system
one database to another database, maybe from relational database to NoSQL database and also
Kafka is used in real-time analysis of data for example one common use case is fraud detection
There are three main distributions of. Kafka, there’s the source-available Apache Kafka
that is distributed by Apache foundation and is developed by them
you also have a Confluent platform which is based on Apache Kafka source-available distribution but
adds many enterprise features and features that will be useful for us in today’s presentation for
example an easy-to-use web-based control center and also you can use Confluent platform in cloud
they have a service system, as a service offering that you can use and it’s
very easy to use, and the third one which is a newcomer to the Kafka industry is Redpanda
they offer a Kafka API compatible solution and the fun fact is that they use the same framework
as ScyllaDB, so they aim to improve the performance of source-available Kafka
So let’s look, have a 50 000 feet view at Kafka, at the center
you have Kafka, on the left you have producers so systems that insert data to Kafka
You use source Connectors to do this job to transfer data from databases services
to Kafka. You can do some processing inside. Kafka for example solutions like kcqlDB
Next, you use Sink Connectors to get data
from Kafka to other systems and we will call them consumers
in this presentation, we will of course focus on our Connectors and
transferring data from ScyllaDB and to. ScyllaDB using Kafka as an intermediary
if you look at a Kafka cluster similar to ScyllaDB you have many nodes of Kafka
those are called brokers and insde the Kafka cluster you have many topics
allow you to partition your data in a logical sense, so for example you could have a topic
of purchases, a topic with registrations for example when user registers there’s
a message sent to topic registration and other topics and within a topic you have partitions
You can think of partitions as shards in ScyllaDB so they allow you to scale the system horizontally
If you look at a single Kafka message it consists of a key, value, and a header and a
key and value are just standard blobs of bytes you can have anything in them, but of course,
in many deployments, you will have something like JSON which is easy to use
easy to consume, using standard libraries.. Regarding the header, I will not cover
this in this session and our Connectors currently don’t support or modify headers