This lesson covers: Size Tiered Compaction Strategy (STCS), Leveled Compaction Strategy (LCS), how they work and when to use them.
Let’s talk about legacy
compaction strategies before introducing incremental compaction. So, the first one of the most common compaction
strategies is called Size-Tiered compaction strategy or STCS in short The basics of Size-Tiered compaction strategy is that it organizes SSTables into tiers,
so each tier has exponentially larger SSTables than the previous tier, so in this example, the SSTables in tier 2 are four times as large as those in tier 1 and in tier 3 it will be sixteen times as large as in tier 1 and we organized the SSTables into these tiers based on their size. Now, when we compact several SSTables from specific or some tier, we create a single SSTable that’s the union, the compacted union of all input SSTables. Now if there were no deletions of duplicated data, then the union will be just the sum of the sizes of the input SSTables and this output SSTable will just become part of the next tier but it could be also much smaller, it could be also much smaller if we deleted most of the data and then it could be written into a lower tier, so it’s not necessarily advanced through the tiers depends on the workload. The attributes of STCS ,the worst thing about it is
that it requires temporary space to compact the data which is at least twice the data size, as I explained before we can’t delete the input SSTables until we wrote the output SSTable and
while doing so the data may be
replicated both in the input SSTables and that output SSTables and this is part of the space amplification so the main factors for the space amplification or first a temporary space program that I just described and then the
accumulation of updates and deletes across the different tiers, so if like an early version of a cell resides in the highest tier and we have a new version in a low tier we can’t compact them together until they arrive at the same tier with STCS
Another legacy compaction strategy is LCS which is Leveled Compaction Strategy coming from Level DB and then in LCS compaction is triggered when a level grows and has more than 10 to the i SSTables and then SSTable picks, like this, extra SSTable from
one of the levels and it compacts it together with the SSTables in the next level creating a new run of SSTables now the set of SSTables that it select, based on the overlapping keys with the input SSTables, so it uses the bloom factor to locate the SSTables it compacts together with this single SSTable, note that the output is a run of SSTables it’s not a single largest SSTable as with STCS but it’s comprised of a run of SSTables, this is important because we adopted this trait of LCS into ICS
So, while LCS limits space amplification it results in higher write amplification because we have to keep rewriting the data multiple times across the levels and that’s the downside of using LCS.