My experience of versioning with a hybrid logical clock in Rust
A while back I wrote about implementing a Hybrid Logical Clock (HLC) API in Rust. But then I wanted to check if it is usable. The usecase for the HLC I chose is versioning a load balancer. This article is about that experience. The load balancer is not implemented but one could combine it with my Reverse Proxy to test it further.
The basic idea in my test case are as follows:
- The load balancer uses
actix_web
to run async handler functions on a backend thread pool. - Let’s imagine that we are collecting statistic like outstanding queue sizes of services downstream and making load balancing decisions based on that.
- There is a manager thread (
syncer
). - Each worker thread has an
Arc
copy of thesyncer
and reads these stats from the global map of downstream services. - Once a backend pod is chosen and request sent, the worker function sends the updated stat to the
syncer
, who then decides whether to update the global stats by comparing timestamps of the different updates.
Please note that the code doesn’t do anything and probably has bugs but the focus has been on the usability of the HLC API.
Of course, we could also have a use case where we measure the response time of requests made to downstream services with the HLC clock.
The timer.rs
part (SysTime
) has not changed much except that I have implemented Eq
and Ordering
. But I guess it’s not strictly needed.
The hlc::Hlc
though had to be moved from a RefCell
to a RwLock
which probably makes it slower depending on how many threads are running. A workaround might be to change the hlc::Hlc::timestamp
function to use the try_write
method and return the currently held value of self.timestamp
by using try_read
if try_write
returns a TryLockError::WouldBlock
.
pub(crate) fn timestamp(&self) -> (i64, i64, i64) {
let ctime = self.curtime.load();
let mut ltime = self.timestamp.write().unwrap();
if ctime == ((*ltime).0, (*ltime).1) {
(*ltime).2 += 1;
} else {
(*ltime).0 = ctime.0;
(*ltime).1 = ctime.1;
(*ltime).2 = 0;
}
((*ltime).0, (*ltime).1, (*ltime).2)
}
However, this does not solve the problem completely since try_read
could also return TryLockError::WouldBlock
if a successful try_write
is underway. A solution could then be to return (ctime.0, ctime.1, 0)
. While this might look like a hack, it has an inherent guarantee. It is bound to be the lowest clock time any other thread will obtain at or after this instant. That is if another thread successfully completed even the try_read
variant, then it is guaranteed to be not lower than (ctime.0, ctime.1, 0)
.
I have generally excluded the use of the hlc::Hlc::update_timestamp
method since I now consider its implementation to be conceptually faulty and requires some more thought.
In the syncer
module, I have defined a Backend
and a Service
. One can imagine these to be the same as defined by Kubernetes and to be fetching information from the service mesh I have defined in another blog. The workhorse though is the SyncMgr::run
function. This function loops over a receiver asynchronously and tries to update the service map based on a HLC comparison. This part of the code self.services[ind] = svc.clone();
might look buggy, but one needs to remember that only the SyncMgr
is making this update.
The full code can be found at this repo.