We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for easy and accurate labeling with sub-pixel precision. We release a total of 419 samples collected in 64 different scenes and annotated with dense ground-truth disparities. Each sample include a high-resolution pair (12 Mpx) as well as an unbalanced pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We evaluate state-of-the-art deep networks based on our dataset, highlighting their limitations in addressing the open challenges in stereo and drawing hints for future research.
@inproceedings{zamaramirez2022booster, title={Open Challenges in Deep Stereo: the Booster Dataset}, author={Zama Ramirez, Pierluigi and Tosi, Fabio and Poggi, Matteo and Salti, Samuele and Di Stefano, Luigi and Mattoccia, Stefano}, booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}, note={CVPR}, year={2022}, }
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We divide the dataset into a training set, and two testing sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks respectively to highlight the open challenges and future research directions in this field.
@article{zamaramirez2024booster, author={Ramirez, Pierluigi Zama and Costanzino, Alex and Tosi, Fabio and Poggi, Matteo and Salti, Samuele and Mattoccia, Stefano and Stefano, Luigi Di}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, title={Booster: A Benchmark for Depth From Images of Specular and Transparent Surfaces}, year={2024}, volume={46}, number={1}, pages={85-102}, doi={10.1109/TPAMI.2023.3323858} }
Booster point clouds (downsampled for web visualization).