What is Acheron?

Acheron is a web-based system that demonstrates the performance implications of out-of-place deletes in Log-structured merge (LSM) trees through visualizing the life cycle of tombstones. LSM trees are frequently employed as core data structures in modern NoSQL storage engines because they offer high ingestion rate and low latency for query processing. In LSM trees, incoming entries are batched in an in-memory write buffer, and once the buffer is full, it is flushed to storage as an immutable sorted run. When the number of accumulated similar-sized runs exceeds a pre-defined threshold, they are merge-sorted to form a larger immutable run.


New Privacy Regulations

Performance


As LSM trees employ the out-of-place paradigm , obsolete data cannot be removed instantly and may be retained in LSM trees for an arbitrarily long time. Obsolete data may on the one hand, violate data privacy regulations (e.g., the right to be forgotten in EU’s GDPR, right to delete in California’s CCPA and CPRA), and on the other hand, it hurts performance. In Acheron, users can observe that the state of the art does not provide guarantees on when obsolete entries can be physically removed, and also observe that FADE can achieve timely persistent deletes without full tree compaction. Users can further customize the workload, LSM tuning knobs, and disk parameters to investigate the impact of different factors on tombstones and the performance.

Deletes in LSM

Every delete operation is implemented by inserting a tombstone, as shown in the following Figure. Within the memory buffer, a tombstone eagerly deletes any older matching entries, and it is maintained to invalidate further matching entries in the tree. When the memory buffer is full, all the entries (including tombstones) in the buffer are flushed to disk as a file with a tagged time stamp. As more data is ingested, this file will be involved in compactions, during which older entries with the same key will be physically discarded.



FADE : FAst DEletes

To enable timely persistent deletes, in prior work, we introduced FADE, a new family of delete-aware compaction policies. FADE (short for FAst DEletes) piggybacks the task of timely delete persistence to the LSM-tree’s compaction routine while retaining the LSM tree’s benefit of amortized merging cost and predictable performance. Acheron taps into the key design of FADE and presents an interactive framework highlighting how one can navigate the performance-privacy trade-off between the delete persistence and a target performance.



Note that, this compaction trigger is completely orthogonal to the saturation-based trigger, and is driven by the age of the oldest tombstone contained in a file. This way, FADE ensures timely delete persistence, and by removing the invalidated entries timely and persistently from the tree, FADE also improves the overall performance of the storage engine in presence of deletes. The figure above shows step-wise how FADE guarantees timely delete persistence by setting earlier deadlines for future compaction jobs.