When looking at system performance it is often useful to get concrete numbers on what the system is actually doing. Oprofile can give you interesting details like listings of every function called, the call graphs and timing information for a live workload on a running system. When the profile shows frequent calls to something like spin_lock() or up_read() though you're going to need to do further