A Data Mesh from Scratch in Rust — Part 4— SSTable
If you would like to get the first three parts, look here. In this part, we will talk about the SSTable
implementation. The idea is that the MemTable
stores the latest data in memory till it reaches a certain size and then writes to the SSTable
, which is on-disk. You can look the code here.
Let’s start by defining our SSTable

Deceivingly innocent! It took a few iterations for the SSTable
structure to evolve to this. The epoch
field is used as a versioning tool and is generated from SystemTime
with epoch
= microseconds since UNIX_EPOCH
. The files are then named as rdeebee-<epoch>.table
. Every SSTable
corresponds to an epoch
and is associated with a file — filepath
.
We use the SSTable
in two ways:
- Writing a
MemTable
to disk. - Getting the data from an older
epoch
for merging or searching.
So we can create an SSTable
from a MemTable
or from a file. The details are simple enough and you can look up the from_memtable()
and from_file()
methods for SSTable
. The from_memtable()
method consumes a MemTable
and writes the events in the MemTable
by iterating over it (look at the MemTable
article for details) using a BufWriter
which is the writer
field.
Now when we create an SSTable
from an existing file, we are only to read; so a writer is not required (remember that our MemTable
iterator generates a sorted stream of events). Also, it is easier to iterate over the SSTable
than to create a MemTable
. Hence, the memtable
and writer
fields are optional.
Typically, we derive the epoch
of an SSTable
from the file name itself. One can also use OpenOptions
to get created time from the file’s metadata. I decided against it to keep things simple and consistent.
The next step is to create the iterator over the SSTable
. We do it in a manner similar to what we did for MemTable
.

This time, though, we also implemented the fn iter(&self)
method which returns an SSTableIterator
without consuming the SSTable
. We do this for ease of writing the merge
method that merges two SSTables
into a new one and writing it to a new file (with a new epoch). But is otherwise similar to the merge
in MergeSort
.

What’s going on here is simple enough.
- Get a vector for the events and another for deleted event ids (
Action::DELETE
). - Get the
epoch
of the twoSSTables
. - Create two non-consuming iterators.
- Iterate simultaneously over both
SSTables
to decide which event gets added to the event vector, the deleted vector or are ignored because there is a newer version (by comparingepochs
). - Finally, write the new
SSTable
to a new file and return the file path.
And that’s it! Next time we will talk about the write-ahead log in this series.