What do you get if you cross Weak Memory with Transactional Memory?

What follows is a summary of the main contributions of a paper I wrote with Nathan Chong and Tyler Sorensen for the PLDI 2018 conference.

This project studies two features of a modern computer, one called out-of-order execution and one called transactional memory.

  • Out-of-order execution is where a computer chooses, for performance reasons, to perform its instructions in an order different from that written by the programmer. Usually out-of-order execution is invisible, but occasionally it can affect the outcome of multi-threaded programs. (Out-of-order execution is one instance of a phenomenon called ‘weak memory’, which is the phrase we use in the paper.)
  • Transactional memory is a feature that allows a programmer to group a sequence of instructions into a ‘transaction’. The entire transaction behaves as if it is a single, uninterruptible instruction.

In recent years, both out-of-order execution and transactional memory have been thoroughly studied in isolation; our work focuses on how these features interact with each other, because feature interactions are notoriously complicated, and are hence a fertile source of interesting bugs. This particular interaction is especially interesting because out-of-order execution gives the computer more freedom (since it allows the computer to rearrange instructions) while transactional memory gives the computer less freedom (since all the components of a transaction must execute together). How should these opposing forces be resolved?

Our work involved building mathematical models that capture out-of-order execution and transactional memory, together. We built models for Intel’s x86 processors, IBM’s Power processors, and Arm’s v8 processors. Our models were informed by referring to technical manuals that describe how the features are supposed to work, by interviewing the architects who designed the features, and, ultimately, by empirically testing them against a range of real machines.

Our empirical testing was limited to the x86 and the Power machines because Arm processors do not actually support transactional memory — at least, not at the moment. Nevertheless, Arm has confirmed that its researchers are actively looking into how transactional memory support could be added to its future processors. (And if/when it is added, we will have a model ready to describe how it works.)

Although we could not check our Arm model against Arm processors, we were able to carry out an interesting analysis of our Arm model. We fed it into an automatic tool we developed in previous work, and discovered a bug. More precisely, we found that the way Arm processors do out-of-order execution is fundamentally incompatible with one aspect of transactional memory called ‘lock elision’.

This is a curious result because it reveals a bug in a processor that not only has not been manufactured yet, but has not even been designed yet!

What is the bug?

In the remainder of this post, I will explain the bug we found using a toilet cubicle as an illustrative example.

The following list of instructions describes the procedure followed by a person who wishes to use the toilet.

  • U1. Wait until the sign above the door handle says “vacant”.
  • U2. Enter the cubicle and lock the door behind you.
  • U3. Check that there is enough loo roll.
  • U4. If there is, do your business (and wash your hands, of course).
  • U5. Exit the cubicle.

The following list of instructions describes the procedure followed by a person who is minded to steal loo roll.

  • T1. Wait until the sign above the door handle says “vacant”.
  • T2. Enter the cubicle and lock the door behind you.
  • T3. Steal all the loo roll.
  • T4. Run away.

These two procedures have been carefully designed so that there can be no clashes between the toilet user and the loo roll thief. If there is a sufficient supply of loo roll in the cubicle when the toilet user enters it, then they will successfully go about their business; if there is not, then they will refrain from doing their business in this cubicle. Either way, no awkward situations should ever arise.

Let us now suppose that the loo roll thief uses a technique called transactional lock elision. This is a technique invented by Ravi Rajwar and Jim Goodman in 2001, and it is one of the main ways that transactions are used in practice. The idea is that by introducing a transaction, one can avoid the faff of actually locking and unlocking the cubicle door.

If the thief applies transactional lock elision, their procedure reduces to just a single instruction:

  • T. If the sign above the door handle says “vacant”, steal all the loo roll and run away.

All is still fine: even though the thief does not actually lock or unlock the cubicle, there can still be no problems for the ordinary toilet user, because the thief only enters when the cubicle is vacant.

However, let us now also suppose that the toilet user is permitted to carry out their instructions out-of-order (just like an Arm processor). One of the reorderings that Arm processors are allowed to do is to swap instructions U2 and U3. If the toilet user does this, then the following unfortunate sequence of events can occur:

  1. The toilet user sees that the cubicle is vacant, but before entering, they check that there is enough loo roll.
  2. While the toilet user is not looking for a moment, the thief sneaks in and steals all the loo roll.
  3. The toilet user enters and locks the cubicle, does their business, and discovers to their great surprise that there is no loo roll!

Importantly, were it not for transactional lock elision, this instruction reordering would be invisible. In other words: current Arm processors are not wrong to allow this reordering. The regrettable situation only arises when you consider the combination of out-of-order execution and transactional memory, which is exactly what we have done in our work.

A disastrous lack of loo roll – or, to return to the computing world: a failure of mutual exclusion – is a serious problem. If Arm is going to support transactional memory in its multiprocessors in the future, it must make changes to its rules about which instructions can execute out-of-order, so that cubicle doors must be locked without delay.

Leave a comment