by Sumit Tandon ProblemNetflix has a number of high throughput, low latency mid tier services. In one of these services, it was observed that in case there is a huge surge in traffic in a very short span of time, the machines became cpu-starved and would become unresponsive. This would lead to a bad experience for the clients of this service. They would get a mix of read and connect timeouts. Read