This lesson covers topics that you need to be aware about when scaling to Multi-DC with ScyllaDB.
It will cover: the importance of Consistency Level, which Policies to use in our project, and giving some examples with Rust and Golang Driver
Introduction
Welcome back to How to Write Better Apps Section!
Today we’re gonna talk about things that you need to be aware when scaling to Multi-DC with ScyllaDB.
In our agenda we’re going to talk about:
the importance of Consistency Level;
Which Policies to use in our project;
and giving some examples with Rust and Golang Driver
… so let’s take a look at our current setup.
Current Setup
So far, let’s say we’re working with a single Datacenter based in Brazil running 3 nodes.
As far as I know, you can check in the graphs: this Datacenter is running with Consistency Level of Quorum until now. It’s pretty solid for one region, and this is the result achieved following the previous lessons.
However, now we want to add one more Datacenter in a different region. Do you think that our app is ready for that?
Policies Configuration
The answer is: no. First, we will need to review one of our implement policies. We added the new Datacenter in Europe and didn’t changed nothing in the client side. While running our app we see that there’s no requests being replicated to the other Datacenter and the reason is that the Client isn’t aware that we added a new Datacenter due the current policy.
So now we need to implement a new policy that says: we still want the Round Robin policy features, but also, we need all requests to be interconnected with the other datacenters.
More than that, we need to tell our client that we now have more than one Datacenter.
And for that,you will need the policy called “DCAwareRoundRobin” in the moment that you add a new Datacenter.
And also you need to tell your application which Datacenter will be the preferred one.
Now, everything seems to be working. Right? And the answer is: no! Something is still missing!
Consistency Level Configuration
Imagine that now you’re sending Read and Write requests to both datacenters with Consistency of Quorum.
The Write will be eventually replicated, which is not a problem. However, the problem is that you’re waiting for a response now from 4 nodes: 3 from your local datacenter and one more from the new datacenter.
This will increase READ latency and we don’t want that. I mean, we don’t need to wait an answer from a different region. Right? But at the same time we want a Consistency Level of Quorum, so what to do?
The solution for it is set the Consistency Level as LocalQuorum in the moment that you switch to Multi-Datacenters. LocalQuorum is a way to tell your driver to only make quorum requests in your local datacenter, not all of them.
Multi DC: Overview and Tips
Now let’s do a quick wrap-up on the lesson.
If you want to use Quorum as default for your whole project, you can set LocalQuorum instead. It’s not a problem and if someday you change to multi-dc you will not have this small issue.
Besides the CQL Dashboard, it’s really cool to take a look in the Detailed Dashboard to check how the Multi-DC is working and pay attention mostly in the Read per instance.
Also, at the bottom of the CQL Dashboard, you can find metrics related to Queries that is running in Multi-DC without the LocalQuorum or LocalOne Consistency.
And the last thing: the drivers have different ways to set it. By default the Rust Driver already came with LocalQuorum as default, so no worries on that. To enable the Multi-DC policy on Rust you just need to select which datacenter is your prefered and it will be all good!
Now let’s solve the problem in the real world.
Solving in the real world
I have two data centers with three nodes each, and I’m running the application using only the consistent level for quorum. The app is aware that we have two data centers, and as you can see, the average write time is around 3 seconds, which is not expected.
If we jump into the CQL dashboard and look closely, we can see that all the queries are running at quorum, which is the problem. When we tail the logs, we can see that the queries are running but are very slow because they are trying to communicate with the other data center.
If we stop here and run step five, where we configure the application to be aware of multiple data centers and set the consistency level to local quorum by default, we will see a significant difference in read and overall metrics.
After waiting for a few seconds and refreshing the dashboard, we can see that the read requests are now back to 2 milliseconds, and the request rate has increased to 1.4k requests per second.
Now, when we look at the SQL dashboard again, we can see that the quorum queries are now on local quorum, so we don’t have to worry about them. We can also see that the reads per instance are being replicated, allowing us to handle more requests per second.
Make sure to set up your driver properly to work with multiple data centers. See you in the next lesson!