RGB-Multispectral Matching
Dataset, Learning Methodology, Evaluation

Fabio Tosi*, Pierluigi Zama Ramirez*, Matteo Poggi*
Samuele Salti, Stefano Mattoccia, Luigi Di Stefano
*Equal Contribution

University of Bologna

Published at CVPR 2022

We gratefully acknowledge the funding support of Huawei Technologies Oy (Finland).

PAPER


RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation

Fabio Tosi*, Pierluigi Zama Ramirez*, Matteo Poggi*, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano
*Equal Contribution

We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity maps. To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera, required only during training data acquisition. In this setup, we can conveniently learn cross-modal matching in the absence of ground-truth labels by distilling knowledge from an easier RGB-RGB matching task based on a collection of about 11K unlabeled image triplets. Experiments show that the proposed pipeline sets a good performance bar (1.16 pixels average registration error) for future research on this novel, challenging task.

CITATION

@inproceedings{tosi2022rgbms,
    title={RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation},
    author={Tosi, Fabio and Zama Ramirez, Pierluigi and Poggi, Matteo and Salti, Samuele and Di Stefano, Luigi and Mattoccia, Stefano},
    booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
    note={CVPR},
    year={2022},
}

GROUND TRUTH ACQUISITION


RGB - 4112x3008
Active Stereo
RGB - 4112x3008
Passive
2nd RGB - 4112x3008
Active Stereo
RGB - 3222x1605
Passive Warped
GT Depth - 4112x3008

GT Depth Warped - 3222x1605

Given a set of active RGB-RGB stereo pairs, we compute the ground-truth disparity aligned with the left image of the RGB-RGB stereo system with our space-time stereo algorithm. Then, we warp it to be aligned with the left image of the RGB-MS stereo system.

DATASET EXAMPLES


RGB - 3222x1605

MS - 510x254
10 visible spectrum wavelengths
Depth - 3222x1605

Point Cloud

NETWORK


Given an unbalanced stereo pair composed of a reference high-resolution image L and a target multi-spectral low-resolution image R, our network estimates a disparity map aligned with L by combining cross-spectral cost probabilities computed by a stereo backbone and deep features from L obtained by the feature extractor.

QUALITATIVE RESULTS


RGB - 3222x1605
Diff. Light Conditions
MS - 510x254
10 visible spectrum wavelengths
GT Depth - 3222x1605

Network Preds - 3222x1605

US

Fabio Tosi
Post Doc
University of Bologna
fabio.tosi5@unibo.it
Pierluigi Zama Ramirez
Post Doc
University of Bologna
pierluigi.zama@unibo.it
Matteo Poggi
Assistant Professor
University of Bologna
m.poggi@unibo.it
Samuele Salti
Professor
University of Bologna
samuele.salti@unibo.it
Stefano Mattoccia
Professor
University of Bologna
stefano.mattoccia@unibo.it
Luigi Di Stefano
Full professor
University of Bologna
luigi.distefano@unibo.it