Safe Global State in Rust: Raw Pointers aboard!
It is pretty common in almost all large projects, at least the ones I have seen, to use a global state of usually non-trivial size. Typically, the state variables are global, static, spread across the codebase, initialised by functions specifically written to initialise them and finally used also all over the codebase. In other words, a royal mess or maybe not, depending on your background!
Anyhow, I recently started migrating an existing project, written in colloquial (to system engineers, in case you are wondering) C, to the (oh-so!) modern Rust. And pretty much immediately, I ran into a quagmire — the global variables (not the constants)! I tried the “obvious” (to whom really!) approach first — define them as globals. The Rust compiler promptly smacked me down. As with most things C, the global variables had raw pointers (Rust makes them sound so dirty!) all over them and I couldn’t really do away with them because many of these pointers were for memory caches created by another underlying, that I was not migrating, C library. So the Rust compiler, assuming that global variables are all about exchanging data between threads (well in this case, there are threads), told me straight up that raw pointers are not Sync (capisce? This in itself, raw pointers not being Sync and Send, is a hotly debated topic that I strongly encourage the reader to look up). Next step was to prowl stackoverflow, the different forums and channels in the wild west of the Rust community for advice. Now, and this, in hindsight, is surprising to me, the overwhelming majority suggested that I do away with the globals and redefine those variables as Arc<Mutexes<_>> or Arc<RwLock<_>> depending on what I need.
let var_name: Arc<RwLock<type_containing_raw_pointer>> = Arc::new(RwLock::new(Default::default()));
But that’s not an easy job — wherever I declare these variables, which in itself is problematic, I now manually need to ensure that these variables are always in scope. That sounded two alarm bells for me:
- The need to manually move around all those variables would obviously be extremely unwieldy.
- The need to manually ensure that “globals” are not dropped by curly braces.
Neither of these two requirements came across as something the designers of Rust would reasonably expect from programmers. But what about using Options — surely they solve everything:
static VAR_NAME: Option<type_containing_raw_pointer> = Default::default();
This or other variations wrapping the Option<_> inside Arc<_> or Arc<RwLock<_>> also, unfortunately, do not work and the Rust compiler makes the same complaints.
So started my research into Rust globals and what follows is that story. My apologies for taking so long to get to the point, but I thought the context, required for the article, mandated the prologue.
Maintaining global state, composed of raw pointers, is, as it turns out, not so exceptional as I was led to believe. So much not so, that there exists some ways to declare, initialise and maintain global state, containing !Sync and !Send types, in Rust programs. And I will briefly introduce them in the rest of this article.
The first option is to use
thread_local!(
...
}
This simply lets the Rust compiler know that we are only going to use these global static variables within this thread. However, this would not work for me as I needed these variables to be accessible across threads.
A second option that I played with for a while is to wrap all these !Sync and !Send types within another struct (because neither the original type nor the traits are implemented in the current crate) and implement these marker traits for those structs. I admit that I have not actually tried this out but it might look something like this:
pub struct wrapper_type {
raw: type_containing_raw_pointers,
}unsafe impl std::marker::Sync for wrapper_type {}
unsafe impl std::marker::Send for wrapper_type {}
These are marker types, so no implementation is needed.
A third approach, that has me excited, was provided by the Fragile crate. Fragile’s current version is 1.0.0, so I believe it is production ready too. And defining global variables with Fragile is as easy as
use fragile::Fragile;static mut VAR_NAME: Option<Fragile<type_containing_raw_pointers>> = None;
Fragile, technically, does not solve the problem entirely but it allows a value to be sent to a different thread. But attempting to access the value on a “non-owning” thread will fail. I am still trying and testing this crate and think it needs all changes to be made locally. That looks fine for now but if it proves too cumbersome or unwieldy then I might go back to option 2.
Finally, in case you want or need your static variables to be initialised at runtime, you can use the lazy_static crate. Lazy_static also is at version 1.4.0 and so I believe is completely production ready and is simple to use. Only caveat is that lazy_static doesn’t allow mutable statics.
use lazy_static::lazy_static;
use fragile::Fragile;lazy_static! {
static ref VAR_NAME: Option<Fragile<type_containing_raw_pointer>> = None;
}
Hopefully, this answers some questions around Rust globals. I would like to hear from you, and would especially like to hear if you happen to know more about this.
Update 1: I feel that this article is “incomplete” in that it leaves out a multitude of nuances just hanging there. The following Stack Overflow discussion adds some nice points.
https://stackoverflow.com/questions/27791532/how-do-i-create-a-global-mutable-singleton
However, what is significant in the above discussion is that while it takes reasonable effort to maintain a read-only global state in Rust, the language does force one to bend over backwards to keep-up with the weirdness that seems to be a non-trivial global mutable state.
As of now, the last two approaches described above seem the easiest to me in order to maintain a global mutable state. In terms of the second approach, I think it might be possible to wrap up all required states in a single struct, in main (for binaries) or some kind of init function (for libraries), provide an `impl` for that struct to get/set the different sub-structs and then pass a reference to that struct around. Other functions and methods then can simply work with whatever they need, possibly without the code being too awkward.
However, bear in mind, that the struct approach does not solve the issue of raw pointers not being thread-safe. It’s just that now the compiler doesn’t know that those calls across thread boundaries might be made. So those errors are just waiting around the corner. So one might actually define their global state struct in either of the following manners:
struct GLOBAL_STATE {
raw_fragile: Fragile<type_containing_raw_pointer>,
...
}
Using Option<_> might be even more safe!
struct GLOBAL_STATE {
raw_fragile_option: Option<Fragile<type_containing_raw_pointer>>,
...
}