To summarize CDC , it’s easy to integrate and consume, uses plain CQL tables, it’s robust, it’s replicated in the same way as normal data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL:ed. Also includes a comparison with Cassandra DynamoDB and MongoDB.
A recommended next step is to run the Apache Kafka and ScyllaDB CDC lab after this lesson.
Transcript
The takeaway of the whole thing
CDC in ScyllaDB, it’s going to be very easy to integrate, it’s going to be
very easy to consume. Everything is plain. CQL table, plain old CQL
nothing magic, nothing special just performance
it’s going to be robust, everything is replicated the same way as normal data is
it’s going to share all the same
properties as your normal cluster, if your cluster works
CDC is going to work, it comes with a reasonable overhead
we coalesced the log writes into the same replica ranges, which means that
yes – you’re doing two writes but it’s gonna share the same request
so it doesn’t add very much and data is TTLd it has a
limited lifetime which means again the. CDC log is not gonna
overwhelm your system you’re not gonna, probably, not gonna get any node crashes
because you enabled CDC, you’re just gonna get
some more data that you may or may not consume
Short comparison with some happy competitors, green is good
red is perhaps less good
Where can you put the consumer, well in Cassandra
you have to do it on node, luckily for the other three – yes
we can put the consumer separated from the ScyllaDB cluster or from the cluster
from the database it can be somewhere else it’s just over wire
ScyllaDB data is deduplicated, you don’t have to worry about
trying to reconcile different data streams from different replicas
it’s already done for you. Do you get deltas? – yes – You get deltas
You know the changes, do you get pre-image?
optionally yes you do, you know the state before the delta
and you know the delta do you get post image optionally again
– yes, we can give you that. What happens if you don’t consume
data is lost but it’s no catastrophe if you didn’t consume it you probably
didn’t want it
you have to take responsibility for your data kids
And the ordering – yes you do
everything again it’s just CQL, you’re going to be able to consume it the way you want
it’s ordered by timestamp, again, the perceived timestamp of the client
it’s a client view of the whole thing so basically it’s the database description
of how your application perceived the state of data as it modified it