[Day43] Read Rust Atomics and Locks - Read-Modify-Write in the x86 Processor

by Mara Bos

At Topic: Chapter 7 Read-Modify-Write Operations

Recall

Compare-and-Exchange Operations

Compare-and-Exchange checks if the atomic value is equal to a given value, and only if that is the case does it replace it with a new value (as a single operation)
compare_exchange_weak: Similar to compare_exchange, but the difference is that the weak version may still sometimes leave the value untouched and return an Err, even though the atomic value matched the expected value.
We could use Compare-and-Exchange to implement all the other atomic operations by putting it in a loop to retry if it did change.

// source: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize.html#examples-11
use std::sync::atomic::{AtomicUsize, Ordering};

let some_var = AtomicUsize::new(5);

// some_var compares to the first argument (5) and if it is equal, it will be replaced by the second argument (10)
assert_eq!(some_var.compare_exchange(5, 10,
                                     Ordering::Acquire,
                                     Ordering::Relaxed), Ok(5));
assert_eq!(some_var.load(Ordering::Relaxed), 10);

Notes

An example of read-modify-write operation in x86-64 and ARM64:

// Rust code
pub fn a(x: &mut i32) {
    *x += 10;
}

// Compiled x86-64
a:
    add dword ptr [rdi], 10
    ret

// Compiled ARM64
a:
    ldr w8, [x0]
    add w8, w8, #10
    str w8, [x0]
    ret

While loading and storing happens in different steps in ARM64, it is clear that the instructions are not atomic.
x86-64's add instruction will be separated by the process into several micro-instructions behind the scenes, so it is not atomic as well.

If instructions run on a single core, the atomic issue is not a problem, as switching a processor core between threads generally only happens between instructions. But if multiple cores are involved, it is crucial to consider the multiple steps of a read-modify-write operation.

x86 systems provide an instruction prefix called lock as a modifier to make instructions like add atomic.

a:
    lock add dword ptr [rdi], 10
    ret

In modern processors, The lock prefix blocks the using memory while allowing other cores to operate on unrelated memory.
The lock prefix can only be applied to a few instructions, including add, sub, and, not, or, and xor.
The xchg instruction (exchange), used for atomic swaps, implicitly behaves as if it has a lock prefix, ensuring it is always atomic.

A fetch_add example in x86-64:

pub fn a(x: &AtomicI32) -> i32 {
    x.fetch_add(10, Relaxed)
}

a:
    mov eax, 10
    lock xadd dword ptr [rdi], eax
    ret

Instead of the add instruction, the xadd (exchange and add) instruction is used, as it places the original value of x into a register (eax) before performing the addition.

add can provide a little of useful to next instructions though, such as whether the updated value was zero or negative.

xadd (O), xchf (O), xsub (X), xand (X), xor (X)
For subtraction, this isn't problematic since xadd can be used with negative values.
For and, or and xor(Bitwise Operations) => bts (bit test and set), btr (bit test and reset), and btc (bit test and complement)
When operations affect more than one bit, they cannot be represented by a single x86-64 instruction.
The x86-64 architecture supports a lock-prefixable cmpxchg (compare and exchange) instruction, which can be used to implement atomic operations that are not directly supported by the x86-64 instruction such as fetch_or:

pub fn a(x: &AtomicI32) -> i32 {
    x.fetch_or(10, Relaxed)
}

a:
    mov eax, dword ptr [rdi]      ; Load the value from the atomic variable (pointed to by rdi) into eax
.L1:
    mov ecx, eax                  ; Copy the value in eax to ecx (prepare for OR operation)
    or ecx, 10                    ; Perform a bitwise OR on ecx with 10, storing the result in ecx
    lock cmpxchg dword ptr [rdi], ecx  ; Atomically compare and exchange:
                                       ; - Compare the value in eax with the value at the memory address rdi
                                       ; - If they are equal, store ecx (new value) at eax
                                       ; - If not, store the current value at rdi in eax
    jne .L1                       ; If the comparison failed (values were not equal), jump back to .L1 to retry
    ret                           ; Return from the function, with the original value in eax

On x86-64, there is no difference between compare_exchange and compare_exchange_weak. Both compile down to a lock cmpxchg instruction.

[Day43] Read Rust Atomics and Locks - Read-Modify-Write in the x86 Processor

Recall

Compare-and-Exchange Operations

Notes

References