This lesson goes over some of the considerations when working with multiple data centers. It includes topics such as consistency, replication factor, performance. Also includes a discussion of tracing and top partitions.
Till now
we dealt with the single DC and we run step 5 with
solution 4 everything was great, so now we test what will happen when we move to
multi DC and the answer is that not everything works when you switch over so
I have DC1 which is in the U.S. I have. DC2 which is in Europe and I want to
serve my customers in the U.S. off the data from the U.S. and I want to serve the
customers in Europe off the cluster in. Europe but when I ran step 5 the
solution from step 5 on multi DC setup this is what I see, I see that the
client is sending requests to both. DCs, inserts and reads and that means
that the client is not DC aware the client is aware of all the cluster
and once you connect the client to a cluster of ScyllaDB then it
will connect to all the nodes but the client is not using its location
information to know with which nodes it should interact with and
that costs performance, okay I’m sending a request from the U.S. to Europe that
will have an additional latency, in addition to that there are additional
metrics and this should actually be a gauge so when we’re doing reads we
expect all the reads to be served from the local DC, we’re not expecting the
reads to be served from two different. DCs and what we can see here is that
to serve a request we have the coordinator that is sitting in DC1
sending some requests to replicas in DC2 and that has to do with my consistency
level, so schematically I have my client sending a request to DC1 but that
coordinator node, that is a replica, is not able to serve that request of nodes
in that DC and has to send the request and in our example I’m doing the
quorum read of six nodes and that means four nodes so one of them has to be in
DC2, so step six fixes both of these so we change the Quorum reads, into localQuorum
and we change the RoundRobin default policy which is very easy to use
DC aware RoundRobin policy in which the client provides its location
information and once we run it all the inserts are sent from the single client
running in DC1, are sent to node in DC1 and since we changed from
Quorum to localQuorum, once a coordinator in DC1 receives the
requests you can process a request only off replicas in DC1
in addition to the gauges that I have talked about there are two other gauges
that exist but are not used or not displayed in this sample
the first one is unpaged CQL reads and gauge should be 100% so 100%
of your request should be paged and this is a bit different, till now I’ve shown
examples of things you need to change in your client application when you write
it this is something you can mess up so the default is for drivers Java, Go
CQL, Python and others is to use paging but at times users think that
they can do better than the server and they will try to do paging on their
application side, okay and you shouldn’t do it you can play with the
page size but you should not switch off paging Butand’ssession about work we’ve
done over the last year will be more clearer as to why it’s much much better to leave
paging on in addition to that there are. Reverse CQL Reads and the gauge is a
warning gauge, okay Reverse CQL queries have to do with the query in
which you have an order by statement in which the order is different from the
base table, so let’s say the base table is descending order and the queries
using ascending order and the reason that this gauge exists is that it’s
harder for ScyllaDB to process those requests it’s not as optimized and when
you see it, there is a good question we have one customer that just started with
the table with a specific ordering and ending up doing all the queries in the
opposite direction and if you use this during development you would have
detected it, okay changing it now is much harder, we need
to reinsert all the data you cannot change the ordering of data inside the
database already. Okay aside of monitoring so till now I dealt with a
lot of monitoring aspects but we do provide you
a lot of tools as well so, Avi is going to cover tools that we allow for CQL
tracing there is probabilistic tracing, slow query tracing and client-side
tracing that allows you to check what is happening with your specific queries for
different reasons, we have added over the last year large partition logging before
that was only a warning, which basically means that you can find out if you have
large partitions and which ones exist and we’re working even to add
even more so 3.1 is going to include nodetool toppartitions which
basically means if you have a lot of queries going to the same single
partition which partition is it