billythedummy's silly diary


2023/10/08

Using Unsafe Rust? You'll Probably Trigger Undefined Behaviour

TLDR: unsafe rust has many UB pitfalls. Use miri to ensure you're not falling into any of them

Inspired by TigerBeetle's design of no runtime memory allocations, I decided to build a zero runtime memory allocation data structure in rust. It seemed like a least-recently-used (LRU) cache was a natural fit for such a requirement, since they typically involve a fixed capacity, beyond which the least-recently-used element was evicted to make room for a new one. Furthermore, I had some needs for one in another project, so all the stars were aligned to make const-lru happen.

What the stars were not aligned on, however, was the amount of undefined behaviour (UB) I would inadvertently introduce in the initial development of const-lru.

Uninitialized memory is a UB minefield

Since the capacity of the LRU cache was designed to be fixed at compile-time, I decided to store the required data in a couple of const-generic array fields. In other words, something like:


pub struct ConstLru<K, V, const CAP: usize> {
    keys: [K; CAP],
    values: [V; CAP],
    ... // other fields
}
      

This was all fine and dandy for small values of CAP, but at larger values, this was causing runtime stack overflows. Box::new(ConstLru::new()) did not help either because semantically this meant "create a new ConstLru on the stack and then move that ConstLru into a newly allocated region in the heap", so the stack overflow still occurred in the first step.

Since I wanted to keep the crate no-std and not dependent on alloc, I decided to create a new unsafe function, init_at_alloc(ptr: *mut Self), to allow users to create large CAP ConstLrus at pre-allocated memory. This allows for stuff like:


let layout = Layout::new::<ConstLru<u32, u16, 10_000, u16>>();
let container: Box<ConstLru<u32, u16, 10_000, u16>> = unsafe {
    let ptr = alloc(layout) as *mut ConstLru<u32, u16, 10_000, u16>;
    ConstLru::init_at_alloc(ptr);
    Box::from_raw(ptr)
};
      

I could then run a common init procedure for both stack and pointer initializations in order to set the correct starting state for the struct's fields. Surely a private fn init(&mut self), where the pointer initialization is:


pub unsafe fn init_at_alloc(ptr: *mut Self) {
    let mut_ref = &mut *ptr;
    mut_ref.init();
}
      

and the stack initialization is:


pub fn new() -> Self {
    let mut res: MaybeUninit<Self> = MaybeUninit::uninit();
    unsafe {
        Self::init_at_alloc(res.as_mut_ptr());
        res.assume_init()
    }
}
      

would work, right?

Pointers may not always become references

WRONG! After digging a little deeper into the docs for MaybeUninit, it turns out that turning a pointer into a reference before the memory pointed to is initialized is undefined behaviour!

Oh no. Naturally, I had to change the function signature from fn init(&mut self) to fn init(ptr: *mut Self), which meant init_at_alloc(ptr: *mut Self) became the common init procedure instead. It was kinda ugly changing all the self.field = init_value to addr_of_mut!((*ptr).field).write(init_value), but it worked.

Always use addr_of_mut! to initialize uninitialized struct fields

The issue was resolved but the experience left me distraught. Rust, for all its virtues of stringent compile-time checking had failed to save me from myself. Besides, how many more instances of UB were left in my code? Surely there was a way to detect and fix them without going through the docs of every unsafe feature I was using.

Just use miri

Thankfully, there was. After some googling, I found out about miri, and it immediately proved to be a godsend. Here are some other instances of UB it caught:

Once you assume_init(), all fields MUST be initialized

In order to speed up the lookup process, ConstLru keeps a binary search index array as one of its fields, bs_index. Since the array consists of primitive unsigned integer values, I decided not to encapsulate the values in a MaybeUninit. As the program does not ever use any values in the array without initializing them, I thought it was fine to leave the field uninitialized when the ConstLru was created. After all, it was just a bunch of unused random integers. However, this is UB according to miri:


  error: Undefined Behavior: constructing invalid value at .value.bs_index[11]: encountered uninitialized bytes
  --> /home/user/const-lru/src/lib.rs:72:13
   |
72 |             res.assume_init()
   |             ^^^^^^^^^^^^^^^^^ constructing invalid value at .value.bs_index[11]: encountered uninitialized bytes
      

In hindsight, this is perfectly reasonable UB, since the compiler does not, and probably should not, treat primitive integers as a special case for allowing uninitialized fields; imagine what would happen if instead of a primitive integer, this was some other type with complex Drop behaviour like Vec. If this wasn't UB, the compiler will generate code that attempts to free memory at random addresses read from the uninitialized bytes!

The fix for this UB was simple: just initialize the values. Though this does introduce some overhead at initialization.

&array[i] and &mut array[i] only borrows the single i-th element of the array

The above statement sounds obvious, so how can it result in UB? Well, what if you wanted to use ptr::copy() to shift an array by 1 element, like in Vec::remove()?


// shift everything between [bs_i, len) right
let bs_i_ptr: *mut I = &mut self.bs_index[bs_i];
unsafe {
    ptr::copy(bs_i_ptr, bs_i_ptr.add(1), len - bs_i);
}
      

bs_i_ptr only mutably borrows self.bs_index[bs_i]. However, the ptr::copy() is attempting to mutate not self.bs_index[bs_i] but self.bs_index[bs_i+1..len+1] instead! miri catches this:


  error: Undefined Behavior: attempting a write access using <154249> at alloc50498[0x1c], but that tag does not exist in the borrow stack for this location
  --> /home/user/const-lru/src/lib.rs:348:17
    |
348 |                 ptr::copy(bs_i_ptr, bs_i_ptr.add(1), l - bs_i);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                 |
    |                 attempting a write access using <154249> at alloc50498[0x1c], but that tag does not exist in the borrow stack for this location
    |                 this error occurs as part of an access at alloc50498[0x1c..0x1d]
      

Admittedly, this is not immediately obvious since the multiple writes are somewhat obscured by the ptr::copy() call. I only understood the miri error message after chancing upon this rust forum post

The "fix" for this UB wasn't really a fix. Since ptr::copy() is inherently an unsafe function, it makes sense to just express the procedure using fully unsafe types, i.e. pointers instead of references:


// shift everything between [bs_i, len) right
unsafe {
    let bs_i_ptr = self.bs_index.as_mut_ptr().add(bs_i);
    ptr::copy(bs_i_ptr, bs_i_ptr.add(1), len - bs_i);
}
      

Conclusion

Rust is a safe language but unsafe rust is a totally different beast. Undefined behaviour abounds when you're dealing with raw pointers and uninitialized memory, but somebody has to do it to create the nice safe APIs. Luckily, miri makes avoiding unintentional UB a lot easier.