Multi Leader Election with Rust and etcd
This is the 13th (unlucky?) article in my “Data Mesh” series. This is the first article where I start implementing this idea. It has been a tough journey for me figuring the multi-leader election out. It helped notch up my distributed systems skills and I thought it might be good to make it a stand-alone article.
It took me roughly over a week, after my idea article, to figure out that systems like Consul and etcd help with leader election. Especially with golang, this seems easier and there are several articles detailing the use case. For Rust, both consul and etcd crates seemed a little lacking. Amongst the consule crates, I liked the ones from Roblox and Hashicorp. Though my fight with both have been too great leading me, in part, to drop a consul based approach. The other reason is that I figured that I might be able to use etcd to build the distributed lock too rather than using two different systems.
I decided to use this crate for a etcd based approach. The crate has a leader election API, which I had assumed to be a part of the etcdv3 concurrency API. But I had some difficulty to get it work properly. Also turns out it is not the use case the crate builders had in mind. However, I did find a workaround using their lock API.
Anyhow, onto the code then, starting with the requirements.

And we trace through the code using the tracing
framework.

Next I defined a couple of structs and types to arrange our data.

We use the ServiceNode
struct to hold information about the leaders in the group. ServerNode
is to hold the node’s data. The Key
type is used to comment within the code.

I implemented Drop
for ServerNode
. The idea here is that when the program exits, any locks
and leases
are released.

There is a lot to unpack in the new
method for ServerNode
.
- We assume the pods are deployed as a
StatefulSet
. So we get pod names like —rdb-election-0
. - This allows us to use the
group size
variable to decide which group this pod belongs to — lines 42–59. - The
election_key_prefixes
are used to identify there are leaders in the group as these keys are used to hold the locks. Might get clearer later.lease_ttl
is used to how long the lease on the lock is held. Ultimately this might be easier to figure out. - The
leader_keys
are the keys that the leaders store theServiceNode
information at.

This function is run whenever a node comes online. It continually checks for the existence/liveness of the election keys. It also makes sure that the same node does not become both leaders for the same group.

The campaign
function makes the current node one of the group’s leaders.
- Get a new
lease
. - Create a new lock identified by the argument to the function and associating its lease to it.
- Put the node’s
ServiceNode
into the key for the leader.

The keep_alive
function is run by the leaders to keep the lease
alive.

The get_leaders
function is used by the non-leaders to collect and store information about the leaders.

Finally the main
function. This function simply boils down to the loop
— lines 44 to 52. The node continually checks for the leaders or becomes one of the leaders.
Testing can be done by deploying the service using server.yaml
. Otherwise, it can be tested using commands like these in different windows:
TRACE_LEVEL=info NODE=Server-1 DBNAME=RDeeBee ADDRESS=192.168.10.10 ETCD=localhost:30327 LEASE_TTL=10 SLEEP_TIME=2 GROUP_SIZE=4 cargo run
TRACE_LEVEL=info NODE=Server-2 DBNAME=RDeeBee ADDRESS=192.168.10.11 ETCD=localhost:30327 LEASE_TTL=10 SLEEP_TIME=2 GROUP_SIZE=4 cargo run
TRACE_LEVEL=info NODE=Server-3 DBNAME=RDeeBee ADDRESS=192.168.10.13 ETCD=localhost:30327 LEASE_TTL=10 SLEEP_TIME=2 GROUP_SIZE=4 cargo run
This will start three processes mocking different hostnames and IP addresses.
Also, check out my article on distributed ordering.
Next I will start working on integrating this with the data mesh.