A multitask benchmarking framework comprising complementary data modalities at a city-scale size, registered across different representations, and enriched with human and machine generated annotations. 27,745 high-resolution 360° images with human-curated annotations, 3D point clouds from: aerial and street-level LIDAR, Structure-from-Motion and Multiview-Stereo reconstructions, geo-anchored based