The lesson starts by describing a healthy system state in terms of memory. It then gives examples of common memory issues such as large allocations and bad_allocs, and what can be done to diagnose and solve these issues.
memory management large allocations bad_allocs so
this is a map of how ScyllaDB looks at memory and we’ll talk about a single
shard so that is a total memory a single shard has and it’s broken up to LSA
which means that ScyllaDB manages that memory internally which it’s float is
broken up to mem tables in cache it’s speed between them so if I’m
accumulating a lot of memtables the cache will be smaller vice versa it also
means that if you have a large read workload the cache may grow as well and
then they started allocation started allocations are usually for they’re not
supposed to be very large usually these are around queues managing the software
and so forth the lifetime of ScyllaDB but again if we have a queue building up it
will eat up memory from mem tables the first thing that will take the hit is a
cache that can be relinquished ok mem tables cannot be dropped out they need
to be flushed to the disk so that the first item that will be hit
so usually most of the memories in LSA cash or memtables again 8 gigabytes of
memory usually to see usual to see 7 gigabytes allocated to mem tables and
cache 1 gigabyte or even less for other things when LSA memory drops it
means that we have to admit in items that translated to higher latencies we
need to read more from the disk to answer your queries you don’t have the
cache anymore larger locations so ScyllaDB manages the memory or tries to manage
the memory in a good manner which means that it breaks up lobs to smaller chunks
it does a lot of things but large contiguous allocations are bad and
they’re bad because they take a lot of time and they take a lot of space and to
reach a large contiguous allocation that it means that we may need to free a lot
of space as well till we get that large buffer that we can write you so large
larger locations are bad and ScyllaDB writes this information to the logs
again it’s a back-trace and it has a size of the allocation so what can you
do with those again report them it seems the docs one thing for ensured we’ll
take a look and it can be again your applications so we can provide you
information what to change on your application side large blobs are again
not great if you disable paging again you could have seen it in the
CPU optimization but if we see larger locations and we find it’s around the
area of building a result set then we know it’s related so if you disable
paging that’s the reason and we can provide you this information we can tell
you to again upgrade change your application or tell us this is important
to you again will to take it into account when we schedule
items bad allocs so larger locations are bad but there is always something worse
than something bad and bad allocs are worse so that means that ScyllaDB wasn’t
able to allocate the memory okay and it straws a bad outlook now that
allocs can happen and ScyllaDB is able to work with some bad allocs so it can happen
they’re not great but again can happen it can be a temporary time where the
what stress we couldn’t allocate there was a bad luck your client or your
application may have gotten an error even back for a request that it’s at and
it will retry and everything will be perfect but in some cases it’s not
transient and in some cases we cannot really keep on working if that’s the
case so again what you can do is always report the issue and we’ll try to figure
out why it happened it’s not perfect. I know but in many cases we can
tell you it’s either try to not build so huge batches break up your batches to be
smaller that will reduce the pressure on memory the locations will
be smaller and you won’t reach that point that’s an example so again you may
be instructed to upgrade change your application or tell us it’s important to
you because there’s no other way of work around it