Contributors Vish Viswanathan, Karthik Kumar, Thomas Willhalm, Sri Sakthivelu, Sharanyan Srikanthan Introduction An important factor in determining application performance is the time required for the application to fetch data from the processor’s cache hierarchy and from the memory subsystem. In a multi-socket system where Non-Uniform Memory Access (NUMA) is enabled, local memory latencies and cr