Performance overhead of thread synchronization

One of the main problems with multithreaded application development is handling the synchronization of data. The failure to do so can result in data corruption. On the other hand, over-synchronizing causes loss of performance. But how slow is synchronization, really?

Edit: As discussed in the comments section, ReaderWriterLockSlim was accidentally forgotten out of the comparison. See a separate follow-up for that.

Parallelization is very tricky, and proper optimization always requires measuring. But even with the complexities of reality, it’s valuable for a developer to understand the rough performance implications of various solutions. To this purpose, I crafted a test scenario to try a few methods out.

The test

Essentially, the idea was to do a very simple operation without synchronization and then apply synchronization primitives around it. The whole simulation is run in just a single thread. Its purpose is not to measure the performance of multiple threads, but to gauge the impact of synchronization.

This is important. For very many applications, synchronization protects against situations that occur very rarely – i.e. threads might not be hitting the same piece of data commonly anyway. In that sense, this test scenario isn’t very much different from many real-world applications.

On the other hand, some pieces of data may be hit very frequently, and queueing for access to it is common. Thus, some real-world applications may see more performance benefit by minimizing the time a resource is kept locked rather than choosing the best synchronization primitive.

With all that said, here’s the key segment of the code:

double sum = 0;
for (int loops = 0; loops < LoopCount; ++loops) {
    int j = 0;
    DateTime start = DateTime.Now;
    for (int i = 0; i < LoopLength; ++i) {
        j++;
    }
    DateTime end = DateTime.Now;

    sum += (end - start).TotalMilliseconds;
    GC.Collect();
}
Console.WriteLine("Average execution time: {0:0} ms", sum / LoopCount);

The variables LoopCount and LoopLength are set to 5 and ten million, respectively. So, the variable j is incremented one-by-one to 10000000 five times, and the average of these runs is used. Some testing has shown this suite to produce reliable enough results.

Various approaches

The basic code above does no locking, and is therefore the fastest competitor – no surprises there. The other tested alternatives are the C# lock statement (equal to using the System.Threading.Monitor class) and the System.Threading.ReaderWriterLock class. Also, since this case happens to handle a very simple operation (incrementing an integer), I also tested System.Threading.Interlocked.

Each test was done by wrapping the “j++” statement in some code, or in case of the Interlocked scenario, replacing it.

object lockObject = new object();


// Loop structure cut out

       lock (lockObject) {
           j++;
       }

The C# lock statement, above, is the simplest and the most generic of the locking approaches. Behind the scenes, it translates to a Monitor.Enter/Monitor.Exit-pair enclosed in a finally-block, so this isn’t language-specific.

If you need more complex locking, get yourself a ReaderWriterLock. It will allow you to handle a scenario where multiple simultaneous readers are OK, but only when nobody is writing – and that writing is only allowed from a single thread at a time.

ReaderWriterLock rwLock = new ReaderWriterLock();…


        rwLock.AcquireWriterLock(TimeSpan.Zero);
        j++;
        rwLock.ReleaseWriterLock();

In the special case of incrementing (or comparing, or value-setting), you can also use the Interlocked class:

System.Threading.Interlocked.Increment(ref j);

How fast are they?

Fixing the execution time of the non-locking implementation as a reference point (execution time = 1), I got the following table of relative execution times (smaller is faster):

Method Execution time
Non-locking 1
lock statement / Monitor 18
ReaderWriterLock 93 *
Interlocked 8

*) Acquiring and releasing a reader lock performs pretty much the same as a writer lock.

The results are pretty clear: Don’t lock if you don’t have to. Firing up a monitor around your increment operator will slow your app to almost 1/20th of the speed. Of course, the relative locking overhead will shrink as your locked operation becomes heavier, so most practical scenarios won’t see such dramatic differences between different models.

That said, don’t even think about skipping on thread safety if your application actually has a multi-threading scenario. Any data corruption issues you may face are extremely harmful and notoriously complex to debug. But in a case when you can choose between various approaches to thread synchronization, choosing a speedier method instead of a slow one may give you quite nice benefits. In particular, it’s important to know when to choose Interlocked operations over a full-blown monitor.

These tests were performed on .NET 3.5 SP1. I will look back into this matter once .NET 4.0 leaves beta stage – its new synchronization primitives are worth another round of testing.

Share with your friends:
  • Digg
  • del.icio.us
  • Facebook
  • Twitter

December 17, 2009  Tags: ,   Posted in: .NET

3 Responses

  1. LenardG - December 17, 2009

    Why not try ReaderWriterLockSlim, that is present in .NET 3.5? It should be more lightweight and faster than the traditional ReaderWriterLock, having been implemented in managed code (as opposed to ReaderWriterLock).

  2. Jouni Heikniemi - December 17, 2009

    To be honest, that's because I didn't think it was in 3.5. I just recently combed through the TPL additions in .NET 4.0, and I mentally grouped all the System.Threading.*Slim classes to the "coming in 4.0" bucket (a lot more slimminess coming in .NET 4.0!). Thus, an unfortunate error on my part. That warrants an update to this post at some point. Thanks!

  3. Heikniemi Hardcoded » ReaderWriterLockSlim performance - December 29, 2009

    [...] while ago I blogged about the performance of various thread synchronization primitives. Due to the insufficient accuracy of my memory cells, I forgot ReaderWriterLockSlim out of the [...]

Leave a Reply