DCSIMG
PDC 2009 Day 2: The State of Parallel Programming - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

PDC 2009 Day 2: The State of Parallel Programming

Burton Smith’s session on the state of parallel programming was standing-room only – I’m sitting on the floor with some chairs blocking my view of the presentation :-)

Generally, Burton Smith lays out a theory of parallel programming that I tried to cover in the notes below.

Imperative languages prefer putting values in parameters, and they are prone to data races which are rather hard to detect considering the amount of possible paths.

Pure functional languages avoid variables – they compute new constants from their old values and provide means for efficient reclamation of old constants. Consider Excel, for example, where cells can be calculated from other cells, and so on.

No variables implies no mutable state at all. Unfortunately, mutable state is crucial for efficiency – especially to prevent always copying all the data whenever it’s passed around. Monads provide mutable state (and input/output operations) to pure functional languages, but they are not inherently parallel.

The fundamental problem is maintaining invariants of state – conservation laws of one’s program (e.g. whenever a reference points to something, the something points back to the original reference – like in a linked list; or a graph that must be acyclic, and so forth). These properties, these invariants, are crucial for a program’s correct execution with parallelism. We have to invent something that describes these properties, because we’re not used to declaring them.

Where do invariants come from? Well, we can sometimes generate invariants from code. At times we can also generate code from invariants – here’s the invariant and the termination condition, generate the code for me please :-) We can also write invariants and code separately and have the compiler or runtime check their preservation (as with Code Contracts for .NET and the C++ annotation language used by the secure CRT). Finally, we can add to the languages a capacity to make sure that the transaction covers the invariant’s domain. What this means for debugging is that there’s a need for programs that check the object’s invariants, so that we can at least check when invariants fail.

Updates performed to object state perturb but then restore an invariant, and this is what proofs of correctness of programs are based on. For parallelism, updates must be isolated to make sure they do not interfere with one another. Updates must also be atomic. Both of these properties (atomic and isolated) bring transactional semantics to our vocabulary.

Invariants provide a general commutativity property. If p and q preserve an invariant I, and p and q don’t interfere with each other, then the parallel execution {p || q} also preserves I. The definition of “don’t interfere” with each other is more complex, and it’s possible to prove that if they execute in isolation and atomically (as transactions), then they do not interfere. Even if the operations do not commute with respect to the state of the object, they do commute (can be reordered) with respect to the invariant (this is not absolute commutativity). This leads to a weaker form of determinism called “good non-determinism”, and something like an operating system cannot allow a stronger form of determinism without losing a great deal of performance.

Assume we want to implement a hash table. The invariant is that an item is in the set iff its insertion followed all removals. There are also some invariants on the structure of the actual buckets in the hash table. Parallel insertions and removals need only maintain the conjunction of these invariants, and do not require deterministic state. (E.g. the order of items in a bucket is not specified and we have no interest in it.)

If we isolate some loads and stores so that they are also atomic, we might still have a situation where they cover only a part of an invariant (e.g. copying around more data than the system’s data bus can perform in a single bus transfer). The domain of the invariant is not necessarily covered by the individual transactions. (E.g. a bank account transfer can be protected separately on each account, but that wouldn’t preserve the bigger, more important, invariant.)

SQL transactions achieve consistency via atomicity and isolation, but they are not general-purpose; operating systems use locks for isolation and perform lock ordering to prevent deadlock. There’s a need for a more general-purpose parallel language that could handle both types of applications.

Implementing isolation means one of the following techniques or a nested combination of several of them: Proving that concurrent state updates are isolated; Locking – where deadlock has to be handled somehow; Buffering updates; Partitioning information (e.g. quicksort); Serializing execution and not allowing parallelism (e.g. COM STA).

Some existing languages provide isolation features – MPI, Erlang and others. Some allow only one thread per address space, some allow only serial execution and lock changes, and there are additional approaches.

Atomicity means “all or nothing”, which requires undo in many scenarios. Isolation without atomicity is not very interesting, but atomicity is vital even in serial executions in lieu of exceptions. The obvious implementation techniques are explicit compensation and logging (restore), and both are challenging for distributed computing and for input/output that often can’t be compensated or restored (non-repeatable, non-testable, non-compensatable actions).

Exceptions threaten atomicity because they require undo in the middle or an aborted state update.

Transactional memory means transaction semantics on a language or framework level, and there’s obviously lot of optimization to do because atomicity and isolation require a huge amount of CPU cycles. There are of course traditional ways to achieve transactional semantics (via compensation, implicit undo, and so forth).

Burton Smith’s opinion is that pure functional languages with transactions are the way to go with regard to parallel programming, and this is the direction Microsoft is heading. Implementations of isolation and atomicity are important and must be optimized for efficiency, and hopefully the hardware architecture will support these things in the future.

Additionally, the von Neumann programming model needs to be replaced – putting band-aids on issues such as coherent cache and memory access is definitely not going to bring us forward to the next generation of parallel programming.

Comments

zinc said:

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism.

# November 19, 2009 6:13 PM

Josh said:

strangely enough, there's another big computing conference going on in the states at the same time as the PDC - sc09.supercomputing.org/.../SC09Schedule1.pdf

# November 19, 2009 10:48 PM

Posts about Programming from google blogs as of November 19, 2009 « tryfly.com said:

Pingback from  Posts about Programming from google blogs as of November 19, 2009 «  tryfly.com

# November 20, 2009 1:47 AM

Hirscher, Borssen win parallel slalom in Moscow | Sports News … | Skiing Leisure Knowledge said:

Pingback from  Hirscher, Borssen win parallel slalom in Moscow | Sports News … | Skiing Leisure Knowledge

# November 22, 2009 8:56 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: