What is rack awareness?
Also covers multiple data centers, replication factor, and consistency levels in multi-dc and the ScyllaDB setup tool.
Configuration. How should I configure my. ScyllaDB boxes to take the most out of it
usually this is something that shouldn’t be of a lot of concern you just run the
ScyllaDB set up, the ScyllaDB set up will configure everything for you, we also as
I said invest a lot into making sure that ScyllaDB is essentially as close as
zeroconf as possible but there are things that we still see people getting
wrong every now and then. ScyllaDB setup is an interactive program so it will
sometimes ask you questions and you’re going to say yes or no, so I
want to expand a little bit and which questions are really important that you
say yes to or at least understand what is the impact of say no, and
also it goes deeper than that. I’m talking about configuration I’m also
talking about like how to configure your racks, how to configure your
topology and etc so how to you essentially get the ScyllaDB cluster ready
to go. Let’s talk about racks first so. I have my nodes and they live in the
same data center but I want to take a look at how do I configure them, how do I
place them in different racks. General advice have as many racks as you have
replicas so if you’re running on a replication factor of three which is the
most common setup that we see, run it in three racks simple as that so if you’re
on the cloud if you talking here about availability zones can you deviate with
that, sure, but I’m gonna show you what are some of the side
effects of doing that why do we give you this advice because ScyllaDB tries to place
a full copy of the data into each rack so if you configure your cluster to be
multi rack and you have three replicas in three racks I will guarantee to you
that you have a full copy in each rack so if you lose the rack you lost the
copy but just a copy. This is what happens, so if we’re doing quorum
reads and writes you’re not gonna even notice if you lost the full rack, the
full rack exploded like it was a nuclear war or something like that and you lost
an entire rack you don’t notice, well you might notice the nuclear fallout in that
case but you’re not gonna notice that their database is having any issues your
QUORUMS are maintained, you haven’t really lost any data, there
is no situation here in which there was data that was there and is no longer
it’s just no problem so this is how it should be. However if you have three
replicas in two racks can I do it, sure yeah, you can do it but what happens is
I’m gonna put my first replica in one of the racks I’m gonna put my second
replica in the other rack and then the third replica since there isn’t a third
rack I just have to put in one of them and then what happens. If I’m lucky
and I lose the rack on top I lost the replica and I’m cool with that my QUORUMS
are maintained nothing happens I didn’t lose any data but if this morning I was
walking down the street and I happen to see a black cat and it was Friday the
13th or if I just exist because let’s face it and let’s be honest this is more
likely a scenario usually being lucky is the exception is not the norm, what’s
going to happen is that you’re going to lose the rack in which you had two
copies and I can actually guarantee you that this is what’s going to happen
because situation and practice is not like though there is this one rack that
is going to be in hosting two copies part of your keys are going to be in
this rack, so you’re going to lose something
this is guaranteed to happen and you just might have lost your job so
depending exactly on your consistency guaranty so I mean this isn’t
something this is one of the things that we’ve heard recently from a customer
like if I lose this cluster I lost my job and they had everything in a single
rack and I said I hope you don’t like your job too much then because that just
might happen and obviously we work with them like how to make this now into a
three rack situation I happen to like the guy very much I wouldn’t like to see
him fired and it can happen if this is this mission-critical there
are some clusters that are not mission-critical you can lose the
data it’s fine and for cost reasons you might want to put it in a single rack
and there is really not much reason to do it in two racks in that case even
single rack is a valid use case for those situations in which you’re fine
in case Google or AWS or Microsoft just lose an entire rack otherwise just keep
those things matching. To actually make that work you have to choose network
topology moving from non simple strategy to network topology is a major pain so
my recommendation always always always use network
topology even if you’re in a single rack because
it will make your life easier later to make those changes, so use network
topology all the time, if you’re not using
network topology can have as many racks as you want but that tells ScyllaDB I don’t
care about placement, network topology is how you tell ScyllaDB I care about
placement and then you’re gonna place your copies in the right racks and then
obviously when expanding the cluster if you are on a single rack you can put
just one node but if you are on three racks you want to put three more nodes
because you want to keep those racks balanced, otherwise again remember
we’re gonna put a full copy in each rack so if you have imbalance there
you’re gonna end up with imbalance in the sizing of your nodes. And do run
as I said, do run the setup tool but if I had to pick one thing, I didn’t have to
I could have picked a lot of things but just to highlight it and enhance it
if I had to pick one thing that people keep doing it wrong
and I could tell an audience just pay attention to that which happily I can
because I’m here on stage, which is configure your network, what does that do
I’ll explain what exactly it does and it will become clear why this is
important. When ScyllaDB setup asks you do you want to configure your network
interface, what it does is, it appends the interrupts in which the CPUs, in
which the interrupts are going to be delivered , we do pinning of this
interrupts according to your hardware so there are two major ways in
which we do it, if you don’t do this, what ends up happening is that the operating
system will unfairly deliver interrupts to a couple of CPUs it doesn’t have to
be one but maybe there are three, four, five CPUs out of your 30 that will get
most of the interrupts, those CPUs are running ScyllaDB as well, again because you
said don’t tune it, you said don’t do anything with it
so those CPUs they’re hosting a part of your data they’re gonna start to
get slower to serve your queries because they are busy serving the network
interrupts for all of your machines. When you say yes to this question when you
say yes I do want to tune my network interface what we are going to do is
that we’re going to look at your hardware and make a decision to use one
of those two modes one of them I will isolate one or two CPUs it
also depends on your hardware but I might isolate one I might isolate two CPUs just
to handle your interrupts and if you do have enough hardware to use and a
lot of other things happen as well,. I might decide I will put all of this in
the first case of course ScyllaDB will not be running in that CPU, so the
interrupts are all going there but ScyllaDB is not running there in the other
scenario I will still run ScyllaDB in all of the CPUs but I will make sure that
all of the CPUs are getting interrupts and all of the CPUs are getting
interrupts fairly so yes they all pay a price but they all pay around the same
price we decide that based on the number of network cues that you have. Very very
very common people saying no to this question because it doesn’t sound like
important, we actually have been trying to convey this importance in the message
more and more but if you don’t do it you end up in this scenario over here, one of
your CPUs is gonna pay for that.. Another one that we actually we want to
put it in the setup too soon, this is actually a problem we’ve just discovered
your clock source but until we do, I mean, you might want to check that out too
what is your clock source, your clock source has nothing to do with NTP
or timekeeping of wall clocks it just means what physical device do I use to
produce information about what is the current time to essentially do time
stamps. ScyllaDB is a very timestamp heavy application so you can have a variety of
choices of which device to use the fastest one is just your processor TSC
which is essentially every time it gives you, it used to be a cycle counter I’m
not going there but this is the fastest one available if you’re running
on AWS for instance on the older i3 machines you might be
using the Xen from the Xen hypervisor the Xen clock source just for you to
have any idea if we measure how long does it take to get a timestamp from
each of those clock sources if you take you from the TSC it takes around 25
nanoseconds if you’re taking it from the. Xen clock source it’s gonna take around
100 nanoseconds so what that means is that every time you go
there and you have to take a time stamp which we do all the time you’re gonna
get it four times slower. We’re going to integrate that soon into the setup
utility but for the time being you might want to check that out