Hardware Support For Transactional Memory

2003-2004: Synchronization between processors running on a parallel machine is one of the most difficult aspects of parallel programming. In shared-memory parallel machines, the use of locks is often required for correct program execution. However, locking is not very natural for programmers and thus can often lead to undesirable results such as deadlock when used incorrectly. For this reason, programmers often use very conservative locking in the interest of correctness or ease of implementation. Moreover, locking is a very conservative mechanism by nature and thus can lead to poor performance and high resource overhead.

Ideally, the programmer should have the ability to invoke several operations on shared data in an atomic fashion without having to worry about locking or performance issues. Transactional memory achieves this goal. With transactional memory, the software on an individual processor is ensured, by the hardware, that a set of memory accesses is either performed atomically or not at all. This notion is very similar to that of a transaction in database terms. Transactional memory can replace locks altogether resulting in a more intuitive programming environment as well as a performance increase.

I propose a design for hardware transactional memory where the transaction size is not bounded by a specialized hardware buffer such as a cache. I describe an unbounded transactional memory system called UTM (unbounded transactional memory) that exploits the perceived common case where transactions are small but still supports transactions of arbitrary size. As in previous hardware transactional memory systems, UTM uses the cache to store speculative state and uses the cache coherency protocol to detect conflicting transactions. Unlike previous hardware systems, UTM allows the speculative state to overflow from the cache into main memory, thereby allowing the transaction to grow beyond the size limitation of the cache. The clean semantics of UTM allow nested transaction support, nontransactional instructions, immediate aborts, a processor snapshot, and context-switching support; all features not found in previous hardware transactional systems. UTM was implemented in a detailed simulator, and experimental results show that it can be integrated with existing hardware straightforwardly while still performing better than conventional synchronization techniques.

For my Master's thesis, I designed and evaluated hardware support for transactional memory on shared-memory supercomputers. This research was funded in part by the National Science Foundation Grant ACI-032497, in part by the Singapore-MIT Alliance, and in part by Silicon Graphics, Incorporated.


IEEE MICRO 2006 Top Picks from Microarchitecture Conferences paper: pdf
11th International Symposium on High Performance Computer Architecture (HPCA-11) 2005 paper: pdf
MIT EECS Master's thesis: pdf
MIT EECS MasterWorks 2004 Talk: html
MIT CSAIL Abstract: pdf
Boston Area Computer Architecture Workshop (BARC) 2004 Talk: html
MIT Computer Architecture Workshop (CAW) 2003 Talk: html
MIT EECS Master's thesis proposal: pdf