When naively profiling multi-threaded applications the time spent waiting for mutexes is not necessarily visible in the generated output. However lock contention can have a big impact on the runtime behaviour of applications. On Linux valgrind's drd can be used to track down mutex contention. Unfortunately running applications under valgrind/drd slows them down massively, often having the effect o