In the case of replica imbalance, the common issues are hot partitions, large partitions, and issues related to the specific node. The node-specific issues include CPU, I/O, and OS related issues. The lesson explains these issues and how to debug them.
we’ve been to the replica imbalance and we got three items that we need to check
or three possibilities it may be a hot partition large partition or a single node
check so hot partition check so we haven’t identified which node is suspect
to be have a hot partition usually it will be three replicas if we’re talking
about our three and then we can use nodetool toppartition and nodetool cfhistograms
so nodetool toppartition is something that we added it has a
drawback that you need to find out which key space in table you’re looking at so
it’s not that easy so the way that I solve it when I need to find out what’s
going on is I use nodetool cfhistograms nodetool cfhistograms
allows to get statistics about key spaces and tables and how many access I
take two snapshots decrement between them and find out which one is access to the
most and then I use toppartition and find if I’m lucky or not basically large
partition row cells so large partitions roll and cell are computed at
compaction okay and we added three tables to collect the information about
those it’s available in master or the next
major version
the other thing is that you can do it you can search ScyllaDB logs for this so
the goods the good news about it is that you can have the background process that
you can look at if you accumulated large partitions the bad news is that it may
be too late at times so systems that are very tuned toward small partitions low
latencies if they see large partition it’s probably too late you
didn’t take that into account when you built your application your data model
and so forth and then you need to figure out how to handle it single node check
this is the most most complex because it can be leading to multiple items so
single node check starts for me starts with CPU OS and I/O checks and I’ll
talk about that then I check the logs I check for errors stalls large
allocations and bad alloc and I wish they weren’t as common but they do happen and
last we check the logs to see if the OS has killed something for any reason
someone talked about the gem X before it is common to see that the own killer
part of the earth kills the ScyllaDB MX okay because the OS didn’t have enough
memory it won’t kill ScyllaDB it found that the gem X ok
grew too much or you allocated it too much and then it will kill it or you may
have disk errors and then you need to replace a disk or replace the
instance if you’re on AWS so the CPU so this is a view of 3.1 and the
interesting part here are the following high time spent in task violation
so what is a task violation so we talked about the different tasks that
we have in ScyllaDB we have read in writes or statements as we call them
we have compactions we have integral flush we have streaming and repair and
so forth. ScyllaDB has a CPU
scheduler and every task quota we allow a different task family to execute okay
if he doesn’t have enough we’ll take another task family and run it but if it
has enough if you run for a quota if it doesn’t context out it
doesn’t give up the CPU and continues running that’s a task violation that’s
for evaluation and that may translate into latencies okay the task photo is
very small so usually you won’t even see it but if
you violation is for example two milliseconds then you will see it in
your latencies okay
if we’re talking specifically about the statement group then if the statements
group is the one violating so you have statements groups again the read and
write requests that may mean that you have a large partition okay because if I
have a huge partition that I need to merge or read information from or use
the cache and update the schema and so forth then it will hit me here I’ll see
a task violation is a statements seek a single node check in the OS level so I
check the disk am I running out of this space so here we can see if we we have a
disk that is running out of disk space or we have imbalanced state and
imbalanced state would be again a background process if I’m running for
example a backup and it’s doing a lot of i/o to the disk and reading from the
disk and uploading it to us for war ever then I’m eating up the disk i/o from
ScyllaDB so ScyllaDB won’t behave the same it’s
sending requests to the disk and it’s not receiving the responses in the same
pace okay so that’s one example and the i/o again compaction
bandwidth and streaming so we talked about the hot partition large partition
and single node checks and that’s unfortunately all the templated analysis
I can provide you with if you’re not falling into any of this
and there are others and others are too big at this point so we are working to
make this better both for you and for us no tilt or
partitions large partitions and didn’t exist so they started to exist because
we need to analyze it and we needed to make it accessible to you guys so you
can analyze it on your own so we’ll add additional tools as time goes by