Skip to content

doublethink.co.uk

Rust Thread-Safety Patterns

I wanted to write a short post to highlight one of main features I'm excited about from Rust and how I think it's differentiated from other languages, namely it's thread-safety. I want to build up to show a neat pattern which demonstrates the power of Rust's type system, but before that we'll cover some basics. I'll assume some interest in Rust but not necessarily experience with anything other than C++.

Coming from a language such as C++ a lot of focus is initially directed towards the memory-safe nature of Rust and the lack of garbage collection via the borrow checking mechanism. This is understandable given that the borrow checker is rather unique; however, I believe it provides less of 'killer feature' for experienced C++ devs than the thread-safety guarantees that Rust can provide.

Rust leverages its type system to prevent types from being used in a thread-unsafe manner. It does this through two marker traits, Send and Sync. For a C++ developer these can be thought of as similar to type-traits in C++ but they should understand that traits are used much more heavily and ergonomically in Rust, as they form the primary way to describe behaviours that a type may have such as Concepts in C++20.

The Send marker trait is used to signify that a value can be safely moved between threads, which is true for most types. The compiler and the type system will enforce this requirement when a value is passed to an API that might cause it to be moved onto a different thread.

Correspondingly the Sync marker trait is used to signify that the value can be used from the context of multiple threads. Rust will prevent any value without this trait from being accessed in a context it can't determine to be necessarily single-threaded.

For both these marker traits, it's useful to know that they also compose together, such that any type constructed from only Sync types will be Sync itself by default. For most users, and especially when writing safe Rust, you'll not implement Send or Sync yourself and instead rely on composition and standard library types that introduce these marker trait properties.

The most obvious of these types introducing Sync to start with is the std::sync::Mutex type, here illustrated with a contrived example:

let locked_data: Mutex<HashMap<Key, Value>> = Mutex::new(HashMap::new());
let mut data = locked_data.lock().unwrap();
data.insert(key, value);

To unpack what is happening here: the Mutex type in Rust is generic over a type, ie Mutex<T>, and even if T is not Sync itself, the Mutex<T> type will be Sync. It has an API and implementation that can provide those guarantees for the wrapped type. It's up to the Mutex author to ensure that using unsafe Rust.

What is nice here from an API design is that the Mutex<T> always wraps the data it protects, so it becomes impossible to access the value outside of how the Mutex's API provides the guarantees it needs to for Sync. Normally the access is gated through the lock() method, which give or take some error-handling, returns a MutexGuard which enforces the underlying locking and unlocking happens through RAII. The value being protected is only accessible during the lifetime of this guard object.

Between the Sync requirement for safe multi-threaded access and design of the above API design enforcing the underlying type is only accessible when certain run-time requirements are met, this ensures we always have thread-safety provided by the type system.

Compare that to a similar example in C++:

std::mutex lock;
std::unordered_map<Key, Value> map;

std::lock_guard<std::mutex> lock_guard(lock);
map.insert({key, value});

Here there is no additional protection of the map value, and nothing ties the lock object to the map. If the locking were accidentally omited the compiler would not notice. Of course, there's always a variety of tools that might help detect it: an experienced code reviewer, some static analysis tools or sanitizers. Conversely in Rust this would always be an immediate compilation error, probably informing you that the map type didn't have the Sync type.

There are of course other thread-safety problems you can still run into in Rust, such as deadlock, but just removing this class of error is a great step forward and means you're free to concentrate on other parts of the program.

There are design implications that will be different in programming Rust versus C++ because of these constraints. A simple example is that if you want one lock guarding multiple data, you'd separate the data into its own type and have it as the T in the the Mutex<T> rather than having the mutex in the same object as the list of data.

A more complicated pattern is one I came across and wanted to get to walk us through, because it shows some more power that Rust has when carefully crafted. This pattern allows iteration over a collection while a lock is being held.

We'll start by defining our collection of values:

pub struct ValueMap {
    values: Arc<Mutex<HashMap<Key, Value>>>,
}

As a sidenote here, Arc<T> can be thought of like a std::shared_ptr, and is used here to ensure that there can be multiple references to the same memory (the mutex) alive from multiple threads, and the reference counting determines the lifetime, not solely the borrow-checker.

Next we'll define a helper type that just wraps a MutexGuard object with a defined lifetime 'a. This lifetime is an annotation that here is basically ensuring that the LockedValueMap will last at least as long as the values referenced inside the MutexGuard. This means that

pub struct LockedValueMap<'a> {
    items: MutexGuard<'a, HashMap<Key, Value>>,
}

Next we'll add a function lock() onto the original ValueMap that locks the ValueMap's mutex and returns the above LockedValueMap type bound to the lifetime of the contents of locked guard object.

impl ValueMap {
    pub fn lock(&self) -> LockedValueMap {
        LockedValueMap {
            items: self.values.lock().unwrap(),
        }
    }
}

On the LockedValueMap we also introduce a method returning an object that implements the Iterator trait from the enclosed (now unlocked) value:

impl<'a> LockedValueMap<'a> {
    pub fn iter(&self) -> impl Iterator<Item = (&Key, &Value)> {
        self.items.iter()
    }
}

These pieces acting together allow the following type of rich iteration:

let x = map
         .lock()                        // Lock the collection
         .iter()                        // Produce an iterator
         .filter(|x| *x.0 > 100)        // Only keys greater than 100
         .fold(0, |acc, x| acc + x.1);  // Fold the values together

This is clear and succinct and allows full use of the powerful Iterator type in Rust, while ensuring that the Mutex is held for the duration of the iteration. If the programmer attempted to do anything such as storing one of a iterated references into another datastructure which has a lifetime longer than the 'a, the borrow-checker would prevent it, ensuring no dangling or un-mutex'd references are allowed.

For me, as a recovering C++ programmer, these thread-safety features make programming concurrent Rust programs much easier and safer than how we'd traditionally do it without resorting to a completely different model such as golang/channels.

More things I've been up to

Public information on one of my team's projects in recent years, bringing RabbitMQ into wide use within Bloomberg. The proxy for AMQP mentioned in this is one of my more recent projects where I've got to code.

Modern Hashtables

As it has been a decade since last learning any compsci I've recently been looking over some more advanced and modern hashtable variations that weren't taught much back and came across a blog series by Emmanuel Goossaert that covered these advancements in open addressed hashtables that's worth highlighting:

Robin Hood Linear Probing

A slight variation on linear probing where when dealing with collisions the item to be inserted is updated with an existing item and the existing item's slot is used if the original item to insert would have a further distance from its intended bucket. Thus stealing from the rich (nearer their intended bucket) and giving to the poor (the more displaced). Main article and a follow up specifically about deletions.

Hopscotch Hashing

A variation where the entry to be inserted will be placed within the neighbourhood of the intended bucket, when the neighbourhood is full an item is displaced recursively until all the items have found a place in their respective neighbourhoods. This algorithm is likely to be very friendly to cache performance, similar to linear probing. Main article.

Cuckoo Hashing

A collision resolution that has a list of distinct hashes to check for each key on lookup, and on insertion guarantees value will be inserted into one of them by displacing one of the existing values (which then needs to find a home).

Article and another discussing the ability to do lock-free cuckoo hashes.

None of these are likely to be make it into a general purpose implementation such as std::unordered_map, that will likely used chained collision resolution, but they are interesting to know for specialized applications and where an open addressed hashtable is most suited (disk and shared memory implementations).

What I've been up to

Since 2005 I've been working in the software infrastructure department in Bloomberg. Bloomberg doesn't give a lot away about its internal technology or what we do with it, but occasionally things are made publically available. There is an increasing presence on github, most notably with our version of the STL and some basic libraries following our coding standards. More specifically to my own work there's a video after the fold of a talk by a colleague at jsconf 2011 about the internal development environment we built a few years ago for the developers building the Bloomberg terminal: