I need to improve the throughput of the system. The usual cycle of optimization has been done and we have already achieved 1.5X better throughput. I am now beginning to wonder if I can utilize the cachegrind output to improve the system's throughput. Can somebody point me to how to begin on this? What I understand is we need to ensure most frequently used data should be kept small enough so that i