Abstract Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we haven not achieved a similar performance jump for high-level scene understanding. Perhaps one of the main reasons for this is the lack of a benchmark of reasonable size with 3D annotations for training and 3D metrics for evaluation. In this paper, we present an RGB-D benchmark s