Concurrent Programming

What will we cover?

Caveat: I have a confession. I thought long and hard about whether to include this topic in the tutor. In the end I decided it was too important a topic to miss out. The reason for my dilemma is that concurrency in general and threading in particular is a very difficult thing to get right and far too many people jump in, thinking it will solve their problems, only to make things worse. When you get it right, it is a very powerful tool, but consider it a weapon of last resort. In my professional life I have worked on many projects that used concurrency, in one form or another, but in my personal programming I have never yet found a need for it. Maybe that says something about when it is appropriate, or maybe just about the kinds of projects I write for myself.

Concurrent Programming - what and why?

Concurrent programming simply means a program that has elements running in parallel. We do this all the time on our computers and the OS manages the various programs that we start, scheduling them to run in order, spreading the load across processors and cores. There is nothing new about the concept of dividing a task into chunks and having each chunk be worked on separately. That's what teams were invented for. Ultimately, it is a performance boosting technique.

In computing terms we have been doing this kind of thing for decades, especially on servers. For example a web server receives web requests and processes them. But if it fully processed each request before starting the next the server input queue would soon overflow on a busy web site. So instead, the server simply reads the request from the queue, if its for a simple html file then it returns it, anything more complex and the server spins off a sub process to deal with it and goes back to read the next request.

We already saw how to create processes from our Python code in the OS topic using the subprocess module. We saw an alternative approach in the IPC topic where we used fork to create a child process which was a clone of its parent. Both of these techniques work fine where you want a few completely separate processes but they are not ideal where you need many such processes (think of the web server handling thousands of requests per minute). One problem is that starting a new process is quite a slow operation (in computer time scales) and also each process carries a lot of overhead in terms of memory use etc. That's where Threads come in.

Threads are like micro processes that actually run inside our parent process. They all share memory space and code. The name comes from the idea that your code runs top to bottom (with a few jumps and loops along the way.) It's a bit like a cotton reel unrolling if you hold onto the end of the thread and let the reel drop. Each parallel thread is like a new cotton reel being dropped. Each thread is an independent execution path through the same code.

We've hinted that concurrency has something to do with performance and, in particular, performance in the face of high data loads. In truth that is about the only reason for using it. If you have a problem with scalability and you have tweaked the basic performance (for a single transaction or data unit) till it squeaks then you may need to think about dividing the load.

Choosing a concurrency solution

Having established that concurrency is a way to improve performance in the face of large processing loads how do you decide whether to use multiple processes or multiple threads to handle a given task? There are several factors to consider:

Python has a major issue with its approach to threading which involves a complex internal mechanism, called the Global Interpreter Lock (or GIL). The effect of this lock is to prevent purely computational threads from running as efficiently as we'd like. Some commentators have gone so far as to say that you should not use threading in Python. That's overstating things, threads are a great way to deal with anything that has to block for I/O operations - and that means most things. It's only pure computational tasks that have any kind of problem (and as we will see in the example below it's not all bad news there either). And for those cases where you will hit problems use the multiprocessing module instead. It is very similar to use and it will do the same job but use processes rather than threads which, as mentioned, are a bit heavier on your computer's resources.

Starting with Concurrent Programming

OK, Having decided we definitely want to use concurrency let's look at the code needed to make the magic happen. There are two thread related modules in the standard library: Thread and threading. The latter is a high level module built on top of the Thread module. You only need to consider the threading one. For multi-processing there is the multiprocessing module which works almost exactly like the threading one.

Defining the problem

Let's consider a simplistic project where we want to calculate the factorial of each of the first N numbers. We can write a function for the factorial calculation and then put it in a loop and store the results. It would look like this:

import time, sys

factorials = []

def fact(n):
    if n < 2: return 1

    result = 1.0
    for x in range(2,n+1):
        result *= x
    return result

def do_it(f,lo,hi):
    global factorials
    for n in range(lo,hi):
        factorials.append(f(n))

if __name__ == "__main__":
    hi = int(sys.argv[1]) + 1
    start = time.time()
    
    do_it(fact, 1, hi)    
    print('Time for %s: %s' % (hi, time.time() - start))

Note that we created a do_it function that wraps up the outer loop and the calls to factorial. We don't really need that here but it will prove useful later when we look at concurrency.

We've made the number of iterations a command line argument and added a timer to show how long it takes. Save it as no_threads.py and, for the first 100 numbers run it like:

$ python3 no_threads.py 100

If you run that you'll find that the time taken is much less than a second - such is the power of modern CPUs. But what if we want to do the same for the first 1000 numbers? That might take a little longer. Try it and see. On my computer it took the times below for 100, 1000, and 10,000 iterations:

$ python3 no_threads.py 100
Time for 100: 0.0006973743438720703
$ python3 no_threads.py 1000
Time for 1000: 0.06539225578308105
$ python3 no_threads.py 10000
Time for 10000: 6.800917863845825

Things look OK up to 1000 but then the figures suddenly get much worse. We probably want to improve performance for these higher values. This is where concurrency come in.

Adding concurrency to the mix

This is a purely computational task so, using the guidelines above, we opt to use multiprocessing rather than threading for this task.

What we want to do is check how high the input number is and if more than 1000 split the processing into multiple branches. We know it is slower with the high end numbers so rather than splitting the data equally between the processes we will divide it (arbitrarily) at 75%. The code will look like this:

import time, sys
import multiprocessing

factorials = []

def fact(n):
    if n < 2: return 1

    result = 1.0
    for x in range(2,n+1):
        result *= x
    return result

def do_it(f,lo,hi):
    global factorials
    for n in range(lo,hi):
        factorials.append(f(n))

if __name__ == "__main__":
    hi = int(sys.argv[1]) + 1

    start = time.time()
    if hi > 1000:
        hi_1 = int(hi * 0.75)
        p1 = multiprocessing.Process(target=do_it, args=(fact,1,hi_1))
        p2 = multiprocessing.Process(target=do_it, args=(fact,hi_1,hi+1))
        p1.start()
        p2.start()
        p1.join()
        p2.join()
    else:
        do_it(fact, 1, hi)
        
    print('Time for %s: %s' % (hi, time.time() - start))

Notice that we create two Process objects, passing the function we want to run (ie. do_it) plus the required arguments in a tuple. We then call start to set the processes running. Finally we use the join method to cause the main program to wait for the processes to complete.

$ python3 processes.py 100
Time for 100: 0.0006771087646484375
$ python3 processes.py 1000
Time for 1000: 0.06577491760253906
$ python3 processes.py 10000
Time for 10000: 3.690302610397339

Notice that by splitting the load we have almost halved the time to process the data. but also notice that it was not a straightforward exercise, the split was not equal(and if you try changing the multiplier you will see that the time rises either way).

So it looks like concurrency gave a significant improvement. But in fact there is a major flaw in our program. If you were to print out factorials, the list of results, you would find that it was empty! Why is this? It's because each sub-process is a clone of the parent and therefore has its own local factorials list, but this is lost when the process ends. So, when using processes, we need to find a way to pass data back to the parent, and that usually involves writing to shared memory (check out the mmap module) or to a database or file. But in any of these cases we are using I/O operations so we might as well use threads since they can share memory! It's our very own brand of Catch 22. Thankfully, in the real world, we very rarely have processes that we need to parallelize that have absolutely no I/O operations so threading is usually an acceptable option.

We'll now look at a somewhat contrived example using threads just to compare the code structure with the multiprocessing example above.

Using threading

In this example we will create a helper thread that updates a variable, theTime every second. The main program will simply loop forever printing the variable every time it changes. To ensure the threads run properly we will introduce a time delay using time.sleep which the interpreter treats like an I/O operation.

import time, threading

theTime = 0

def getTime():
    global theTime
    while True:
        theTime = time.time()
        time.sleep(1)

def main():
    global theTime
    current = theTime

    thrd = threading.Thread(target=getTime)
    thrd.start()

    while True:
        if current != theTime:
            current = theTime
            print(time.asctime(time.localtime(current)))
        time.sleep(0.01)

if __name == "__main__":
   main()

There shouldn't be anything too strange in there. We create a function to encapsulate the threads task, making sure that there is a time.sleep operation (or any other I./O function) within it. We then create a thread assigning our function as the target, exactly like we did with the processes above. We start the thread running in the background and then go into the main loop that checks the variable every hundredth of a second. To terminate the program you will need to type Ctrl-C (or use your OS task manager to kill the process).

If you run that you should see something like:

$ python3 clock.py 
Sat Dec 30 11:30:33 2017
Sat Dec 30 11:30:34 2017
Sat Dec 30 11:30:35 2017
Sat Dec 30 11:30:36 2017
...

Of course we could have achieved that more easily without using threads but it gives you an idea of what minimal threading code looks like. If you compare it with the multiprocessing code above you will see that they are nearly identical in structure. The key difference is that the thread is able to update the getTime global variable which can also be seen by the main function in the parent "process". You can now go back and substitute threads for processes in the previous example and compare results. You will likely find that the threaded version gives no time advantage over the non threaded version.

I'm not going to say any more about concurrency here. It is a complex topic with many subtle traps waiting to bite you. My best advice is don't use it unless you have to. If you do need to use it then take this tutorial as a starting point and go and read some of the more technical and in-depth tutorials on the web. Hopefully this has given you an awareness of what is possible and some of the potential pitfalls to be wary of.

Things to remember

Previous  References