TL;DR, it is not only DNNs that can fit random labels under a setting where correct labels can be fitted with small generalization errors. The recent paper “Understanding deep learning requires rethinking generalization” by Chiyuan, et al. is attracting high interest as it is an unintuitive and surprising fact that standard DNN models can even (over)fit random labels as well. I think the paper des