A Data Mesh from Scratch in Rust — Part 4— SSTable

Published in

Towards Dev

3 min readOct 19, 2022

If you would like to get the first three parts, look here. In this part, we will talk about the SSTable implementation. The idea is that the MemTable stores the latest data in memory till it reaches a certain size and then writes to the SSTable , which is on-disk. You can look the code here.

Let’s start by defining our SSTable

Deceivingly innocent! It took a few iterations for the SSTable structure to evolve to this. The epoch field is used as a versioning tool and is generated from SystemTime with epoch = microseconds since UNIX_EPOCH . The files are then named as rdeebee-<epoch>.table . Every SSTable corresponds to an epoch and is associated with a file — filepath .

We use the SSTable in two ways:

Writing a MemTable to disk.
Getting the data from an older epoch for merging or searching.

So we can create an SSTable from a MemTable or from a file. The details are simple enough and you can look up the from_memtable() and from_file() methods for SSTable . The from_memtable() method consumes a MemTable and writes the events in the MemTable by iterating over it (look at the MemTable article for details) using a BufWriter which is the writer field.

Now when we create an SSTable from an existing file, we are only to read; so a writer is not required (remember that our MemTable iterator generates a sorted stream of events). Also, it is easier to iterate over the SSTable than to create a MemTable . Hence, the memtable and writer fields are optional.

Typically, we derive the epoch of an SSTable from the file name itself. One can also use OpenOptions to get created time from the file’s metadata. I decided against it to keep things simple and consistent.

The next step is to create the iterator over the SSTable . We do it in a manner similar to what we did for MemTable .

This time, though, we also implemented the fn iter(&self) method which returns an SSTableIterator without consuming the SSTable . We do this for ease of writing the merge method that merges two SSTables into a new one and writing it to a new file (with a new epoch). But is otherwise similar to the merge in MergeSort .

What’s going on here is simple enough.

Get a vector for the events and another for deleted event ids (Action::DELETE).
Get the epoch of the two SSTables.
Create two non-consuming iterators.
Iterate simultaneously over both SSTables to decide which event gets added to the event vector, the deleted vector or are ignored because there is a newer version (by comparing epochs).
Finally, write the new SSTable to a new file and return the file path.

And that’s it! Next time we will talk about the write-ahead log in this series.

Towards Dev

A Data Mesh from Scratch in Rust — Part 4— SSTable

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Towards Dev

Written by Ratnadeep Bhattacharya, PhD

No responses yet