[Day44] Read Rust Atomics and Locks - Load-linked and Store-Conditional Instructions on RISC
by Mara Bos
At Topic: Chapter 7: Load-linked and Store-Conditional Instructions
Notes
Background: A compare-and-exchange
loop is the most common way to implement atomic operations.
- The closest thing to a
compare-and-exchange
loop on a RISC architecture is a load-linked/store-conditional loop (aka. LL/SC loop). - Load-linked (LL): Similar to regular load
- Store-conditional (SC): Similar to regular store, but conditional
- LL/SC should be used together and target the same memory location.
- SC refuses to store if memory has been overwritten since LL.
- If SC fails, the loop will simply retry. => Promise of atomicity
- ARM64 always lacks single-instruction atomic
fetch-and-modify
orcompare-and-exchange
operations. - Due to its design, ARM64 separates load and store steps from calculations and comparisons.
- ARM64 uses
ldxr
(load exclusive register) andstxr
(store exclusive register) for load-linked and store-conditional operations.
clrex
(clear exclusive) is available as an alternative tostxr
to stop tracking memory writes without storing any data.
Here is an example for x.fetch_add(10, Relaxed)
in ARM64:
; non-atomic code to add 10 in function a
a:
ldr w8, [x0]
add w8, w8, #10
str w8, [x0]
ret
; x.fetch_add(10, Relaxed) in function a
a:
.L1:
ldxr w8, [x0] ; Load the value at address pointed to by x0 into register w8 (LL).
add w9, w8, #10 ; Add 10 to the value in w8 and store the result in w9.
stxr w10, w9, [x0] ; Attempt to store the value in w9 back to the address pointed to by x0 (SC).
; The result of the store is written to w10. 0 = success, 1 = fail.
cbnz w10, .L1 ; If the store failed (w10 != 0), branch back to .L1 to retry the loop.
ret ; Return from the function.
- Compilers generally minimize the number of instructions in LL/SC patterns to reduce the risk of LL/SC loops failing repeatedly or spinning indefinitely.
ARMv8.1 Atomic Instructions
- ARMv8.1 introduced new CISC-style atomic instructions in ARM64 for common atomic operations.
- E.g.
ldadd
(load and add) forfetch_add
, without the need for LL/SC loops. - E.g. instructions for
fetch_max
- E.g.
cas
(compare and swap) coressponds tocompare_exchange
. - These new instructions can be more performant than LL/SC patern and good for some specialized hardware.
Compare-and-exchange on ARM
compare_exchange_weak
:
pub fn a(x: &AtomicI32) { x.compare_exchange_weak(5, 6, Relaxed, Relaxed); }
=>
a:
ldxr w8, [x0] ; Load the value at memory address in x0 into register w8 (LL)
cmp w8, #5 ; Compare the value in w8 with 5
b.ne .L1 ; If w8 is not equal to 5, branch to label .L1
mov w8, #6 ; If comparison succeeded (w8 == 5), move the value 6 into register w8
stxr w9, w8, [x0] ; (SC) Try to store the value in w8 to the address in x0
; and set w9 based on success (0 = success, 1 = failure)
ret ; Return
.L1: ; Handling the failure case
clrex ; Abort the LL/SC pattern
ret ; Return
Normally,
compare_exchange_weak
is used in a loop to retry on failure. In this example, we only call it once and ignore the return value to focus on the relevant assembly.
stxr
is allowed to have false negatives that it might fail even if the memory location has not been changed by another thread. (This is whycompare_exchange_weak
is called weak.)
compare_exchange
:
pub fn a(x: &AtomicI32) { x.compare_exchange(5, 6, Relaxed, Relaxed); }
=>
a:
mov w8, #6 ; Move the value 6 into register w8 (this will be used for the store)
.L1:
ldxr w9, [x0] ; Load the value from memory at address x0 into register w9 (LL)
cmp w9, #5 ; Compare the value in w9 with 5
b.ne .L2 ; If w9 is not equal to 5, branch to label .L2 to handle failure
stxr w9, w8, [x0] ; (SC) Attempt to store w8 (value 6) at memory address in x0
; w9 will be set to 0 on success or 1 on failure
cbnz w9, .L1 ; If w9 is non-zero (store failed), branch back to .L1 to retry
ret ; Return
.L2: ; Handling the failure case (comparison failed)
clrex ; Abort the LL/SC pattern
ret ; Return
To compare with compare_exchange_weak
, compare_exchange
has:
- An extra branch
- An extra
chnz
(compare and branch on nonzero) instruction - The
mov
instruction is outside the loop to keep the loop as short as possible.