Advanced Data Management by Lena Wiese
Author:Lena Wiese
Language: eng
Format: epub
Publisher: De Gruyter
Published: 2015-02-27T16:00:00+00:00
fig 8.6. Compaction on disk
8.2.4 Compaction
After some time, several flushes of memtables will have occurred and hence there will be quite a lot of data files stored on disk. These data files will most probably contain some outdated records: records whose time-to-live value has passed, records for which more recent versions exist and for which the maximum number of stored versions is exceeded, or records which are masked by a tombstone. Outdated records not only unnecessarily occupy disk space, they also slow down read processes because they have to be loaded and compared with other records in the combine process of data retrieval (see Section 8.2.1). This is why a process called compaction was devised to remove any unwanted records and merge a set of data files into a new one. As sketched in Figure 8.6., a set of data files is chosen for compaction, their records are merged and the result is written to a new larger data file (at a new location on disk); finally, the small input data files can be deleted. More specifically, a minor compaction merges only a small subset of all data files, whereas a major compaction merges all data files into a single new one.
Several things have to be considered during compaction:
– The records of all key-value pairs in the data files have to be sorted by their keys and hence reordering and restructuring of the index is necessary.
– At the same time, time-to-live values have to be interpreted so that expired records can simply be ignored.
– If one of the data files contains a tombstone, all data that are masked by the tomb-stone and have been written prior to the tombstone can be ignored. Note that records that are masked by the tombstone but have been written after the insertion of the tombstone (because they are contained in a more recent data file as identified by the data file sequence number) are handled differently: these records are merged into the new data file but will still be masked by the tombstone if it is a minor compaction. Tombstones themselves can only be deleted during major compaction; this means that only after a major compaction more recent records for a key will be visible because they would previously be masked by the tombstone. This somehow incoherent behavior is usually chosen to simplify the compaction process and the interpretation of tombstones during a read process. Other semantics of deletions can be enforced but this would require data retrieval as well as minor compactions to be more involved.
– In some extensible record stores, versioning settings are also enforced during compaction: only a specified amount of versions for each key is kept at the maximum. For example, if the maximum amount of versions to be stored is set to three, for each key the records with the three most recent timestamps are copied to the compacted data file while all records with older timestamps are ignored.
– Last but not least, changing column family settings can be
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Agile | Lean |
Quality Control | Six Sigma |
Total Quality Management |
Hit Refresh by Satya Nadella(8831)
The Compound Effect by Darren Hardy(8458)
Change Your Questions, Change Your Life by Marilee Adams(7334)
Nudge - Improving Decisions about Health, Wealth, and Happiness by Thaler Sunstein(7207)
The Black Swan by Nassim Nicholas Taleb(6732)
Deep Work by Cal Newport(6504)
Daring Greatly by Brene Brown(6202)
Rich Dad Poor Dad by Robert T. Kiyosaki(6132)
Principles: Life and Work by Ray Dalio(5884)
Man-made Catastrophes and Risk Information Concealment by Dmitry Chernov & Didier Sornette(5613)
Playing to Win_ How Strategy Really Works by A.G. Lafley & Roger L. Martin(5361)
Digital Minimalism by Cal Newport;(5344)
Big Magic: Creative Living Beyond Fear by Elizabeth Gilbert(5318)
The Myth of the Strong Leader by Archie Brown(5215)
The Slight Edge by Jeff Olson(5185)
Discipline Equals Freedom by Jocko Willink(5142)
The Motivation Myth by Jeff Haden(4975)
Stone's Rules by Roger Stone(4831)
The Laws of Human Nature by Robert Greene(4743)
