This lesson covers the Incremental Compaction Strategy (ICS). It explains how it works, performance considerations, and compares it to other compaction strategies.
incremental
compaction strategies a new compaction strategy that we already have working for quite a while we’re still in the last last days of it’s impressive how software have bugs right now we should have a law just passed the law by government bugs are not allowed a lot of those bugs were like those there is this one scenario in which the guarantee is not really kept so we want to address that should be coming in a matter of months most it’s going to be
enterprise only and he used to be called hybrid compaction strategy if we talked about it in the ScyllaDB summit a
couple of years back at that time we call it hybrid compaction strategy why the name hybrid doesn’t really tell you much about the strategy which is why we change but it comes from being a hybrid between size tiered and leveled compaction strategy how does that work the way it works is that we don’t write an sstable anymore when you have a memtable that is being written to an sstable or the result of a compaction instead of writing an sstable we write something called an ssstable run an sstable run is a generalization of the concept that leveled compaction has about sets of sstable with disjoint ranges so even when I’m writing the result of a mem table flush now if you’re using
incremental compaction strategy instead of writing the one big sstable I’m going to write one big sstable run that is comprised of many small ssstables physically of the same size and
logically I call it that’s a run that’s that’s a logical entity I keep doing size tiered compaction strategy so the logic of incremental compaction strategy is the same as size tiereded the same so you’re not gonna be good by the way in cases in which I have a lot of updates in which you could keep just L 1 and L 0 for a level because you still need to get it all the way back to the last level so it’s exactly the same as size tiered except my input is now partitioned so that means that I can compact it piecemeal I don’t have the 50% thing anymore because the 50%
space amplification from size
tiered comes from the fact that I have to compact my entire sstable set into a new sstable set what that means is that I cannot release that space but with incremental compaction strategy once I know and obviously I mean probably obvious maybe obvious that when you compact sstables as you walk them and in order like in insert order so I look at the tokens I said because the SS tables are disjoint I know that this sstable here is from A to F so when I compacted everything A to F and I already closed the next level I can just discard those sstables earlier
right so you essentially solve the problem of space amplification of size tiered because you still have logically a big sstable but physically that is
partition in disjoint ssstables
that are small so the
space amplification is actually even better than leveled strategy because in leveled strategy you always have that factor of 10 and in incremental compaction strategy you don’t have the factor of 10 the number the the number you might need is essentially 160 which is the size of sstable times however many compactions you have in parallel size tiered you can have compactions in parallel if they are from the same table in different tiers so you can compact different tiers at the same time but at the worst case you’re going to be compacting as we discuss four, level compaction strategy will not compact multi tiers in parallel but it will compact 10 sstables so the amount of disk space that you’re gonna have free it’s actually better than
leveled compaction strategy because the constant is better by a factor of 2,
2 and something the write amplification it’s not as bad as leveled it says it’s the same as size tiered because when I move an sstable to a different level I don’t have to rewrite the level but at the same time because they are disjoint I can get rid of my input sstables faster and the read amplification is like size tiered it will be good sometimes it will not be good it will be great for append-only workloads for append-only workloads that’s essentially the best for update heavy it depends maybe again if you are lucky but I wouldn’t depend on luck just use that I mean if you’re lucky in your leveled compaction strategy you can keep everything in the lowest levels then maybe it’s better but keep in mind that every time you do it every time you go up a level your paying a factor
of 10 right and here you don’t have that factor of 10 so this this is size tiered versus incremental compaction strategy same example as example number one so we are looking at here space amplification and then you can see like you’re essentially having it doesn’t look we could have included leveled and it’s not the same as leveled, leveled is different the spikes are different those are details but it’s much better you never have the spike they goal of incremental compaction strategy is to allow you the benefits of size tiered compaction strategy
without the cost of 50% disk space so it’s usually good if you are you say oh I want to use size tiered the other compaction strategy all had drawbacks but the only thing it’s keeping me on other compaction strategy is the fact that I need more disk space I want to go all the way to 80 90 percent then incremental compaction strategy