Cross-Spectral Neural Radiance Fields

Matteo Poggi*, Pierluigi Zama Ramirez*, Fabio Tosi*
Samuele Salti, Stefano Mattoccia, Luigi Di Stefano
*Equal Contribution

University of Bologna

Published at 3DV 2022

We gratefully acknowledge the funding support of Huawei Technologies Oy (Finland).

PAPER


Cross-Spectral Neural Radiance Fields

Matteo Poggi*, Pierluigi Zama Ramirez*, Fabio Tosi*, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano
*Equal Contribution

We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of different modalities from arbitrary viewpoints, which are aligned and at the same resolution. Experiments on 16 forward-facing scenes, featuring color, multi-spectral and infrared images, confirm the effectiveness of X-NeRF at modeling Cross-Spectral scene representations.

CITATION

@inproceedings{poggi2022xnerf,
    title={Cross-Spectral Neural Radiance Fields},
    author={Poggi, Matteo and Zama Ramirez, Pierluigi and Tosi, Fabio and Salti, Samuele and Di Stefano, Luigi and Mattoccia, Stefano},
    booktitle={Proceedings of the International Conference on 3D Vision},
    note={3DV},
    year={2022},
}

VIDEO


OVERVIEW


We extend the vanilla NeRF to learn a cross-spectral representation of the scene. By capturing images with a rig featuring cameras sensible to different modalities (e.g., color, infrared, etc.) from different viewpoints, we train Cross-Spectral Neural Radiance Field (X-NeRF) to render any of the modalities from any of the different viewpoints. Thanks to Normalized Cross-Device Coordinates (NXDC), X-NeRF learns to render aligned modalities. The training procedure ignites from RGB camera poses alone, while relative poses of the other sensors are learned during optimization. By leveraging recent advances in neural rendering, we also implement a variant capable of faster convergence (X-DVGO).

ACQUISITION SETUP


To validate our proposal, we collect an in-house dataset featuring 16 indoor scenes. We acquire images with a cross-spectral rig, made of a color camera (RGB, 12Mpx), a 10-bands multi-spectral camera (MS, 0.1Mpx) and an infrared camera (IR, 1Mpx).

DATASET EXAMPLES


IR - 1023x1023
RGB - 4112x3008
MS - 510x254

QUALITATIVE RESULTS


IR
RGB
MS
Depth

US

Matteo Poggi
Assistant Professor
University of Bologna
m.poggi@unibo.it
Pierluigi Zama Ramirez
Post Doc
University of Bologna
pierluigi.zama@unibo.it
Fabio Tosi
Post Doc
University of Bologna
fabio.tosi5@unibo.it
Samuele Salti
Professor
University of Bologna
samuele.salti@unibo.it
Stefano Mattoccia
Professor
University of Bologna
stefano.mattoccia@unibo.it
Luigi Di Stefano
Full professor
University of Bologna
luigi.distefano@unibo.it