This lesson gives an overview of ScyllaDB drivers, and the benefits of using them.
The two main benefits are shard awareness and better support for paging. The next lessons will cover drivers for specific languages, including hands-on examples of how to use them.
Before I dig deeper into details, I would like to explain to you why we need our own drivers
so as you probably know, we work with many clients, we see multiple deployments
and along the way we notice that there’s a way to improve cluster performance by
driver enhancements and on principle we want to contribute back to wide
Cassandra community, so we’re working towards getting our enhancements
accepted to Apache projects but it’s never easy and usually not fast, so to
bring value to ScyllaDB users right now, today, we decided to fork the drivers and
make them available to everyone right now, while we keep working on up
streaming the changes. So we’ve got two new features at the moment, one is better
support for paging and the other is shard awareness. Let’s dig into
details of all of them, so starting with paging. What we can do with it, how
we can improve it’s actually a very simple idea, we make sure a driver for
each particular request fetches all the pages from the same
coordinator, this way we can make use of the hard work done by coordinator and
cached on its state and making less work we’re able to get more throughput
and lower latencies. Shard awareness on the other hand, it’s just
the next level of request routing, so current drivers can route the request to
the right node because they are token aware but as you probably already know
ScyllaDB shards the data on the node into shards and each shard belongs to
one CPU, the connection to the node can end up in any of those CPUs, so routing
the request to the right one reduces the inter-shard communication and improves
latency. Basically how it works, the requests end up in some CPU
which is responsible for handling it and now it’s either the owner of the data
that’s requested or not, if not then it has to pull the data from the other
shard and only then it can push it down to the network. When it’s lucky and the
request ends up straight where the data is, we don’t need this inter-shard communication
And here are some results, so you can see that latencies
with this improvement, in this case, in the right workload, they can be
up to four times better, here’s the same results just by shard