Rust Thread-Safety Patterns
I wanted to write a short post to highlight one of main features I'm excited about from Rust and how I think it's differentiated from other languages, namely it's thread-safety. I want to build up to show a neat pattern which demonstrates the power of Rust's type system, but before that we'll cover some basics. I'll assume some interest in Rust but not necessarily experience with anything other than C++.
Coming from a language such as C++ a lot of focus is initially directed towards the memory-safe nature of Rust and the lack of garbage collection via the borrow checking mechanism. This is understandable given that the borrow checker is rather unique; however, I believe it provides less of 'killer feature' for experienced C++ devs than the thread-safety guarantees that Rust can provide.
Rust leverages its type system to prevent types from being used in a thread-unsafe manner. It does
this through two marker
traits, Send
and Sync
. For a C++ developer these can be thought of as
similar to type-traits in C++ but they should understand that traits are used much more heavily and
ergonomically in Rust, as they form the primary way to describe behaviours that a type may have such
as Concepts in C++20.
The Send
marker trait is used to signify that a value can be safely moved between threads, which
is true for most types. The compiler and the type system will enforce this requirement when a value
is passed to an API that might cause it to be moved onto a different thread.
Correspondingly the Sync
marker trait is used to signify that the value can be used from the
context of multiple threads. Rust will prevent any value without this trait from being accessed in a
context it can't determine to be necessarily single-threaded.
For both these marker traits, it's useful to know that they also compose together, such that any
type constructed from only Sync
types will be Sync
itself by default. For most users, and
especially when writing safe Rust, you'll not implement Send
or Sync
yourself and instead rely
on composition and standard library types that introduce these marker trait properties.
The most obvious of these types introducing Sync
to start with is the
std::sync::Mutex
type, here illustrated
with a contrived example:
let locked_data: Mutex<HashMap<Key, Value>> = Mutex::new(HashMap::new());
let mut data = locked_data.lock().unwrap();
data.insert(key, value);
To unpack what is happening here: the Mutex
type in Rust is generic over a type, ie Mutex<T>
,
and even if T
is not Sync
itself, the Mutex<T>
type will be Sync
. It has an API and
implementation that can provide those guarantees for the wrapped type. It's up to the Mutex author
to ensure that using unsafe Rust.
What is nice here from an API design is that the Mutex<T>
always wraps the data it protects, so it becomes impossible to access the value outside of how the Mutex's API provides the guarantees it needs to for Sync
. Normally the access is gated through the lock()
method, which give or take some error-handling, returns a MutexGuard
which enforces the underlying locking and unlocking happens through RAII. The value being protected is only accessible during the lifetime of this guard object.
Between the Sync
requirement for safe multi-threaded access and design of the above API design
enforcing the underlying type is only accessible when certain run-time requirements are met, this
ensures we always have thread-safety provided by the type system.
Compare that to a similar example in C++:
std::mutex lock;
std::unordered_map<Key, Value> map;
std::lock_guard<std::mutex> lock_guard(lock);
map.insert({key, value});
Here there is no additional protection of the map
value, and nothing ties the lock object to the
map. If the locking were accidentally omited the compiler would not notice. Of course, there's
always a variety of tools that might help detect it: an experienced code reviewer, some static
analysis tools or sanitizers. Conversely in Rust this would always be an immediate compilation
error, probably informing you that the map type didn't have the Sync
type.
There are of course other thread-safety problems you can still run into in Rust, such as deadlock, but just removing this class of error is a great step forward and means you're free to concentrate on other parts of the program.
There are design implications that will be different in programming Rust versus C++ because of these
constraints. A simple example is that if you want one lock guarding multiple data, you'd separate
the data into its own type and have it as the T
in the the Mutex<T>
rather than having the mutex
in the same object as the list of data.
A more complicated pattern is one I came across and wanted to get to walk us through, because it shows some more power that Rust has when carefully crafted. This pattern allows iteration over a collection while a lock is being held.
We'll start by defining our collection of values:
pub struct ValueMap {
values: Arc<Mutex<HashMap<Key, Value>>>,
}
As a sidenote here,
Arc<T>
can be thought of like astd::shared_ptr
, and is used here to ensure that there can be multiple references to the same memory (the mutex) alive from multiple threads, and the reference counting determines the lifetime, not solely the borrow-checker.
Next we'll define a helper type that just wraps a MutexGuard object with a defined lifetime 'a
.
This lifetime is an annotation that here is basically ensuring that the LockedValueMap
will last
at least as long as the values referenced inside the MutexGuard
. This means that
pub struct LockedValueMap<'a> {
items: MutexGuard<'a, HashMap<Key, Value>>,
}
Next we'll add a function lock()
onto the original ValueMap
that locks the ValueMap
's mutex
and returns the above LockedValueMap
type bound to the lifetime of the contents of locked guard object.
impl ValueMap {
pub fn lock(&self) -> LockedValueMap {
LockedValueMap {
items: self.values.lock().unwrap(),
}
}
}
On the LockedValueMap
we also introduce a method returning an object that implements the Iterator
trait from the enclosed (now unlocked) value:
impl<'a> LockedValueMap<'a> {
pub fn iter(&self) -> impl Iterator<Item = (&Key, &Value)> {
self.items.iter()
}
}
These pieces acting together allow the following type of rich iteration:
let x = map
.lock() // Lock the collection
.iter() // Produce an iterator
.filter(|x| *x.0 > 100) // Only keys greater than 100
.fold(0, |acc, x| acc + x.1); // Fold the values together
This is clear and succinct and allows full use of the powerful Iterator
type in Rust, while
ensuring that the Mutex
is held for the duration of the iteration. If the programmer attempted to
do anything such as storing one of a iterated references into another datastructure which has a
lifetime longer than the 'a
, the borrow-checker would prevent it, ensuring no dangling or
un-mutex'd references are allowed.
For me, as a recovering C++ programmer, these thread-safety features make programming concurrent Rust programs much easier and safer than how we'd traditionally do it without resorting to a completely different model such as golang/channels.