We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of different modalities from arbitrary viewpoints, which are aligned and at the same resolution. Experiments on 16 forward-facing scenes, featuring color, multi-spectral and infrared images, confirm the effectiveness of X-NeRF at modeling Cross-Spectral scene representations.
@inproceedings{poggi2022xnerf, title={Cross-Spectral Neural Radiance Fields}, author={Poggi, Matteo and Zama Ramirez, Pierluigi and Tosi, Fabio and Salti, Samuele and Di Stefano, Luigi and Mattoccia, Stefano}, booktitle={Proceedings of the International Conference on 3D Vision}, note={3DV}, year={2022}, }
We extend the vanilla NeRF to learn a cross-spectral representation of the scene. By capturing images with a rig featuring cameras sensible to different modalities (e.g., color, infrared, etc.) from different viewpoints, we train Cross-Spectral Neural Radiance Field (X-NeRF) to render any of the modalities from any of the different viewpoints. Thanks to Normalized Cross-Device Coordinates (NXDC), X-NeRF learns to render aligned modalities. The training procedure ignites from RGB camera poses alone, while relative poses of the other sensors are learned during optimization. By leveraging recent advances in neural rendering, we also implement a variant capable of faster convergence (X-DVGO).
To validate our proposal, we collect an in-house dataset featuring 16 indoor scenes. We acquire images with a cross-spectral rig, made of a color camera (RGB, 12Mpx), a 10-bands multi-spectral camera (MS, 0.1Mpx) and an infrared camera (IR, 1Mpx).