My experience of versioning with a hybrid logical clock in Rust

A while back I wrote about implementing a Hybrid Logical Clock (HLC) API in Rust. But then I wanted to check if it is usable. The usecase for the HLC I chose is versioning a load balancer. This article is about that experience. The load balancer is not implemented but one could combine it with my Reverse Proxy to test it further.

The basic idea in my test case are as follows:

  1. The load balancer uses actix_web to run async handler functions on a backend thread pool.
  2. Let’s imagine that we are collecting statistic like outstanding queue sizes of services downstream and making load balancing decisions based on that.
  3. There is a manager thread (syncer).
  4. Each worker thread has an Arc copy of the syncer and reads these stats from the global map of downstream services.
  5. Once a backend pod is chosen and request sent, the worker function sends the updated stat to the syncer, who then decides whether to update the global stats by comparing timestamps of the different updates.

Please note that the code doesn’t do anything and probably has bugs but the focus has been on the usability of the HLC API.

Of course, we could also have a use case where we measure the response time of requests made to downstream services with the HLC clock.

The timer.rs part (SysTime) has not changed much except that I have implemented Eq and Ordering. But I guess it’s not strictly needed.

The hlc::Hlc though had to be moved from a RefCell to a RwLock which probably makes it slower depending on how many threads are running. A workaround might be to change the hlc::Hlc::timestamp function to use the try_write method and return the currently held value of self.timestamp by using try_read if try_write returns a TryLockError::WouldBlock.

pub(crate) fn timestamp(&self) -> (i64, i64, i64) {
let ctime = self.curtime.load();
let mut ltime = self.timestamp.write().unwrap();
if ctime == ((*ltime).0, (*ltime).1) {
(*ltime).2 += 1;
} else {
(*ltime).0 = ctime.0;
(*ltime).1 = ctime.1;
(*ltime).2 = 0;
}
((*ltime).0, (*ltime).1, (*ltime).2)
}

However, this does not solve the problem completely since try_read could also return TryLockError::WouldBlock if a successful try_write is underway. A solution could then be to return (ctime.0, ctime.1, 0). While this might look like a hack, it has an inherent guarantee. It is bound to be the lowest clock time any other thread will obtain at or after this instant. That is if another thread successfully completed even the try_read variant, then it is guaranteed to be not lower than (ctime.0, ctime.1, 0).

I have generally excluded the use of the hlc::Hlc::update_timestamp method since I now consider its implementation to be conceptually faulty and requires some more thought.

In the syncer module, I have defined a Backend and a Service. One can imagine these to be the same as defined by Kubernetes and to be fetching information from the service mesh I have defined in another blog. The workhorse though is the SyncMgr::run function. This function loops over a receiver asynchronously and tries to update the service map based on a HLC comparison. This part of the code self.services[ind] = svc.clone(); might look buggy, but one needs to remember that only the SyncMgr is making this update.

The full code can be found at this repo.

--

--

Ratnadeep Bhattacharya (https://www.rdeebee.com/)

Distributed Systems researcher and engineer (grad student) at The George Washington University!