(or how to build a supercomputer in 21 hours)
- A small introduction to message passing and MPAPI
- Source code and binaries can be downloaded at CodePlex
How do we write secure, robust, scalable, mutilthreaded software? There are several approaches, and no definitive answer to this question. But the recent advent of multicore processors have put a new emphasis on how we as programmers write software that can utilize this new hardware architecture so that the processors are not idle and we get maximum performance.
Traditionally we write sequential programs. That is what programmers are taught, and it is (relatively) easy to understand the flow in the software. Programmers are also taught that they should avoid multithreading whenever possible, because chances are that bugs will be present – indeed one gets the feeling that bugs are inherent in multithreaded software. And the fact is that getting multithreading right in imperative languages like C++, Java or C# is extremely hard; the primitives provided in these languages for handling threads and their synchronization are not adequate. It looks simple on paper, but writing multithreaded software is rarely as trivial as the textbook examples.
The problem is shared state. All threads have access to the same memory locations so several threads can concurrently read and update the same memory. That is a recipe for disaster, as you no doubt realize. The solution to this is using locks, either as semaphores, monitors, mutexes or some other construct. But using locks can be very hard to get right – race conditions and deadlocks lurks right beneath the surface. As Simon Peyton Jones from Microsoft Research put it in the book ”Beautiful code” by Greg Wilson, ”locks are bad”. Either you take too few locks, too many lock, take the wrong locks, or take them in the wrong order. Furthermore, error recovery is very hard, because the programmer has to guarantee that any error does not leave the system in an inconsistent state. I have been a developer and architect on several enterprise level systems with a panic solution that restarts the application if an unknown error occurs and we cannot guarantee the state of the system. That is not by choice as much as by necessity – the complexity of systems rapidly grow out of control when dealing with multithreading.
About Message Passing
Message passing is a different approach to concurrency than we are used to when using modern, imperative languages like C++, C# or Java. In those languages we use shared state concurrency, where all threads in the same process space has access to the same areas of memory (i.e. variables etc.). This present a problem with how we synchonize the threads to avoid inconsistent memory, and it is usually done with locks, semaphores, monitors or other constructs. As explained in the preface this is not as trivial as one might think, and there are always one or more problems in the synchonization that can be very hard to detect, leading to unstable systems and a lot of time spend on debugging and rewriting parts of the software.
In message passing concurrency each thread has access only to its own state. This helps programmers write more robust software since the need to synchonize memory no longer exists. The only means for a thread to communicate with other threads is by sending them messages , and state is not transferred between threads in these messages.
This approach also has the advantage of working perfectly in a distributed environment, since the simplest solution to letting processes and threads on physically disparate computers communicate is by using message passing. This is already being done in existing frameworks – MPI (Message Passing Interface), MS-MPI (Microsoft Message Passing Interface), Beowulf clusters and others – used in high performance computing (HPC).
About Message Passing API (MPAPI)
I wrote the MPAPI framework for 3 reasons:
- I was looking for a way to simplify the development of multithreaded applications without having to worry about locks and other thread synchronization mechanisms. The problems with thread synchronization are becoming more and more apparent as we move from single core to multicore CPUs. The majority of software today is written with a single core in mind, and hence does not fully utilize the multiple cores in modern CPUs.
- I spend a couple of weeks investigating the functional programming language Erlang. Erlang has from the beginning been designed with parallel and distributed computing in mind. The solution the engineers at Ericsson came up with is very simple and elegant, and I wanted to investigate how this would work in a .NET context.
- For some time I have been researching genetic algorithms and genetic programming. This is a particularly compute intensive branch of computer science, and I wanted a framework that would easily let me write software that could scale on multiple computers.
The framework is now in a state where it can be released, although it is still in some kind of flux as I gain more and more experience with developing software with it.
- Worker : A worker is the main entity of the MPAPI framework. This is where the programmer specific code is written, and a worker has all the primitives needed to communicate with other workers in the cluster. A worker runs in its own background thread inside a node, and receives messages in an asychronous manner through its mailbox. A worker has an id which is unique within the node. Together with the node id this identifies the worker in the cluster.
- Node : A node is the entity that runs one or more workers. The node is responsible for dispatching messages to the right receivers, and for maintaining connections to all other nodes in the cluster. There is typically one node per computer, and each node has some primitives that enables Worker-code to access information about the cluster which it is part of. Each node is assigned a unique id within the cluster.
- Message : Messages are the means by which workers communicate. A message consists of:
- An Id, which is unique for each message. This is mainly used to trace messages through the cluster, and should be disregarded.
- A message type which distinquishes it from other types.
- A message level, which indicates if the message has been send by the system of from user code.
- Content, which is the payload of the message. This is used to transfer state to and from workers.
- Registration Server : This server is used to register and unregister nodes in the cluster. It is also responsible for notifying existing nodes when a node registereds or unregisteres. The registration server is distributed as a standalone executable with the MPAPI framework, but is not necessary to start if the software runs only one node on a local machine.
One of the design goals was to provide programmers with a simple interface to write multithreaded, single-computer applications using message passing. Since the majority of software is not distributed, this was a core design principle.
Figure 1 : A local node with multiple worker threads.
But since message passing is very useful when controlling applications on multiple computers, and since some systems require more computing power than is ever possible with a single computer, I designed the distributed logic into the framework as well. In the MPAPI framework this provides the programmer with a slightly modified set of primitives, but basically he or she is unaware of the distributed aspect.
A cluster build with MPAPI consist of a main node, which controls the cluster, a number of sub nodes, which are the real work horses of the cluster, and a registration server. The registration server binds the cluster together by allowing nodes to register and unregister with the cluster, and existing nodes in the cluster is notified of such events. Communication between nodes is not handled by the registration server, but directly from node to node.
Figure 2 : A cluster with a main node, a number of sub nodes, and a registration server.
The registration server is a standalone executable which is distributed with the framework.
An application written in MPAPI implements a number of workers. Each worker communicates with each other through the node. The messages are not propagated down through the Remoting layer unless two communicating workers are on two different nodes.
Figure 3 : The layered structure of a MPAPI application.
This asynchronous design enables programmers to write more robust in the likes of a micro kernel.
A micro kernel is a special operating system (OS) design, where the OS itself it nothing more than a message layer; drivers and everything that comprise an OS are implemented as separate entities that communicate through messaging. Linux, Unix and Microsoft Windows are examples of operating systems that are monolithic, not micro kernels. MK design is an old concept, but lately even Microsoft has realised the potential to write robust and secure operating systems with this approach – their newly released Singularity research operating system, that showcases how such operating systems might (or should?) be implemented in the future, is based on a micro kernel design.
Figure 4 : A micro kernel OS, where everything – applications, drivers, etc. – communicates via message passing.
The following goals where formulated prior to starting development.
- The framework must be compatible with both Microsoft.NET and Mono.NET. That is why I wrote the RemotingLite framework.
- Writing single-computer or multiple-computer (cluster) applications must be equally simple. Making an application function in a distributed environment must be trivial, and the framework must not function differently from a programmers perspective when in local- or distributed mode.
- Performance is paramount. Thus I have gone to great lengths to optimize all aspects of the framework, from serialization of messages, to controlling threads that perform all the asynchronous operations of the framework.
- The programming paradigm must be simple, the number of primitives small, and the approach must be somewhat similar to what is done in the functional programming language Erlang. Simplicity is a key design goal.
- Software systems written with MPAPI must be highly reliable, scalable and robust. This is accomplished by preventing errors from any given worker to affect all other workers – a crashing worker will only notify about it, if another worker is monitoring it.
Well, the framework will always be updated as I find bugs and new features to implement. But this whole exercise has been to prove that it is possible to use message passing in .NET, and that it will help developers write more robust code easier than with regular thread synchronization mechanisms. Personally I think Dijkstra was wrong, and since Microsoft uses a micro kernel approach in their new operating system Singularity, which is to serve as a research platform for future operating systems, I think more have come to the same conclusion. Surely message passing simplifies writing software, even in a distributed environment. The engineers at Ericsson, who developed Erlang, must be laughing.
I will now continue my work with genetic programming, where the MPAPI framework is just a means to an end.