In this small article I will show you how you can use the power of functional programming to write data structures for parallel processing that scales well on any number of cores.
What is one of the primary concerns for programmers right now? Writing high performance code that scales well regardless of whether the software is running on a CPU with one, two or 1000 cores.
Todays software does not scale well, and one of the major obstacles is making it run efficiently on multiple cores without encountering the problems that are bound to arise when programming with multiple threads. When dealing with parallel processing data parallelism is a great tool to accomplish the goal of writing highly scalable software.
In functional programming a list is one of the most fundamental data structures you can find. Furthermore it is also great for – say – iterating, and it is very easy to make parallel. I will show you a few examples written in F# – a functional programming language for the .NET platform with roots in ML and OCaml. I will demonstrate the principles with a simple data type (PList) that can be used in a wide variety of ways for processing data, and which utilizes the number of cores in your computer without you having to worry about it.
Finally I will endulge in a bit of theory for extending this data structure to span multiple machines connected in a computing cluster.
I recently discovered this webcast called “Erlang – software for a concurrent world” by Joe Armstrong. It is recorded at a JAOO conference. Yes, I wrote “a”… I can’t find any info on which one it is but i strongly suspect it is from either London, Sydney or Brisbane in 2008.
Anyway, the webcast is about an hour long, and is really interesting.
Remember to check out Joe Armstrongs blog if you are interested in Erlang and functional programming.
After weeks of writing, rewriting and testing the code for the Message Passing API (MPAPI) it is finally finished.
MPAPI is a framework that enables programmers to write concurrent, parallel and/or distributed software systems – in essence building cluster computers. I started writing it for a couple of reasons:
- My research into genetic programming (GP) was stalling a bit due to limited computing resources. Since I have a few computers around the house I wanted to enlist them in the huge computations that are necessary in GP, and I wanted to write a framework for building such distributed systems.
- For years I have written multithreaded (concurrent, parallel) software using the normal constructs for that in C++, Java and C#. Time and again I have debugged such applications, and I have yet to see a multithreaded program that is bugfree. It is simply too complex to write such applications using the normal synchronization mechanisms in languages with shared state concurrency. After writing a bit of Erlang code I was profoundly pleased with the way that language handles concurrency, and the ideas I have learned from that language has been incorporated into MPAPI. (more…)
How do we write secure, robust, scalable, mutilthreaded software? There are several aproaches, and no definitive answer to this question. But the recent advent of multicore processors have put a new emphasis on how we as programmers write software that can utilize this new hardware architecture so that the processors are not idle and we get maximum performance.
Traditionally we write sequential programs. That is what programmers are taught, and it is easy to understand the flow in the software. Programmers are also taught that we should avoid multithreading whenever possible, because chances are that bugs will be present – indeed one gets the feeling that bugs are inherent in multithreaded software. And the fact is that getting multithreading right in imperative languages like C++, Java or C# is extremely hard; the primitives provided in these languages for handling threads and their communication are not adequate. It might look simple on paper, but multithreaded software is rarely as simple as the textbook examples. (more…)