NTIRE 2024: HR Depth from Images of Specular and Transparent Surfaces

We are delighted to inform you that the Booster dataset will be employed in the HR Depth from Images of Specular and Transparent Surfaces Challenge as a part of the NTIRE 2024 workshop in conjunction with CVPR 2024!

INTRODUCTION

Depth estimation has a long history in computer vision and has been intensively studied for decades.
Deep learning succeeds in this field as well, with modern deep networks achieving ludicrous error rates on established datasets such as KITTI, Middlebury, etc.
Should this evidence suggest that, thanks to deep learning, depth estimation is a solved problem?
Definitely, no! It is time for the community to focus on the open-challenges left unsolved in the field. In particular, the Booster dataset identifies two of such hazards:

Non-Lambertian surfaces, such as those of transparent/reflectant objects
High-resolution images

This challenge aims at fostering the community towards developing next-generation monocular or stereo depth networks capable of reasoning at a higher level, and thus yield accurate, high-resolution 3D reconstructions for challenging objects, yet of common use.

CHALLENGES DESCRIPTION

This challenge aims to estimate high-resolution disparity or depth maps from stereo or monocular images respectively.
There will be two tracks for the challenge, both hosted by Codalab servers. Participants can join either both tracks or only one. The two tracks are:
TRACK 1 - STEREO

TRACK 2 - MONO

The challenges will be divided into two phases:

Development: During this period, the participants will have to construct a model for the selected track (Monocular or Stereo). The model can be trained using the Booster training split and any additional data. The approach can be evaluated on the official validation set of each track.
Test: During this period, the participants can submit the predictions of their model on the official test set. The disparity/depth maps will be evaluated by the organizers with the quantitative metrics.

DATASETS

TRAINING DATA [DOWNLOAD]

The training set is composed of 38 different indoor scenes, containing transparent or reflective objects. Each scene was acquired with several illumination conditions, for a total of 228 training images at 4112x3008 resolution. For images belonging to the training split, we release high-resolution stereo images, material segmentation, left and right disparity ground-truth, occlusion mask, and calibration parameters.

Notes on training data:

We do not restrict submitted methods from using additional training data. If it is used, it is necessary to indicate the source and amount.
We do not restrict submitted methods from using pretrained networks. If it is used, it is necessary to provide details.

VALIDATION DATA TRACK 1 - STEREO [DOWNLOAD]

The validation set is composed of 3 different indoor scenes, containing transparent or reflective objects. Each scene contains five different illumination conditions, for a total of 15 validation images at 4112x3008 resolution. For images belonging to the validation split, only the left and right images, and the calibration parameters are released.

VALIDATION DATA TRACK 2 - MONO [DOWNLOAD]

The validation set is composed of 3 different indoor scenes, containing transparent or reflective objects. Each scene contains five different illumination conditions, for a total of 15 validation images at 4112x3008 resolution. For images belonging to the validation split, only monocular rgb images, and the calibration parameters are released.

TEST DATA [TRACK 1 - STEREO][TRACK 2 - MONO]

To rank the submitted models, we test them on a separate test set. Also for the case of the test set, only rgb images and the calibration parameters are released. The participants are required to apply their models to the released images and submit their high-resolution results to the server. It should be noted that the images in the test set cannot be used for training.

DEV KIT [DOWNLOAD]

We provide some useful scripts to read and visualize Booster data.

SUBMISSION AND EVALUATION

SERVERS [Codalab Stereo Track] [Codalab Mono Track]

We use CodaLab servers for online submission in the development and test phases, testing the results on the validation set and test set respectively. After the test phase, the final results and the source codes (both training and test) need to be submitted via emails (boosterbenchmark@gmail.com).

EVALUATION METRICS

STEREO
We take inspiration from Middlebury 2014 and compute the percentage of pixels having disparity errors larger than a threshold τ (badτ). We compute bad-2, bad-4, bad-6, and bad-8 error rates, given the very high resolution featured by our dataset. Finally, we also measure the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). All the metrics introduced so far are computed on any valid pixel (All), on pixels belonging to Transparent or Mirror surfaces (Class ToM), or on Other type of materials (Class Other). For any metrics the lower the better. All results are reported using the left image as reference. To rank submissions, we use only the bad2 averaged on pixels belonging to ToM surfaces. Other metrics might be used to declare the final winner of the competition.

MONO
IMPORTANT: We evaluate predictions in the depth domain, e.g., closer objects have smaller values. As monocular networks estimate depth up to a scale factor, we first compute a shift and scale to match predictions and ground truth ranges.
Then, we compute the absolute error relative to the ground value (ABS Rel.), and the percentage of pixels having the maximum between the prediction/ground-truth and ground-truth/prediction ratios lower than a threshold (δi, with i being 1.05, 1.15, and 1.25). We also estimate the mean absolute error (MAE), and Root Mean Squared Error (RMSE). All the metrics introduced so far are computed on any valid pixel (All), on pixels belonging to Transparent or Mirror surfaces (Class ToM), or on Other type of materials (Class Other). MAE, RMSE, and ABS Rel. are lower the better. δi instead is higher the better. To rank submissions, we use only the δ1.05 averaged on pixels belonging to ToM surfaces. Other metrics might be used to declare the final winner of the competition.

SUBMISSION - PREDICTIONS FORMAT

STEREO
The npy should contain 32bits disparity values for the 4112x3008 images and should refer to the left image, matching its resolution.

MONO
The npy should contain 32bits depth values for the 4112x3008 images and should refer to the left image, matching its resolution. Note: We evaluate depth maps up to a scale and shift factors.

SUBMISSION - DEVELOPMENT PHASE

During the development phase, the participants submit their results on the validation set to get feedback from the CodaLab server.
The validation set should only be used for evaluation and analysis purposes but NOT for training.
The submitted zip files have a structure similar to this: $scene/$img_basepath.npy. E.g., Mirror3/0000.npy.
An example of submissions for the "Track 1 - Stereo" on the validation set can be found [ HERE ].
An example of submissions for the "Track 2 - Mono" on the validation set can be found [ HERE ].

SUBMISSION - TEST PHASE

During the test phase, the participants submit their results on the test set on the CodaLab server.
The test results will not be visible to other participants during this phase.
The test set CANNOT be used for training.
The submitted zip files have a structure similar to this: $scene/$img_basepath.npy. E.g., Mirror3/0000.npy.

FINAL SUBMISSION

After the test phase, the participants will submit a zip file (containing fact sheet, source code, and final results) to the official submission account (boosterbenchmark@gmail.com) by email.
The final submission should be made by the following rules:
The submitted results must be from the same method that generated the last submission to the CodaLab. We will check the consistency. Otherwise, the submission is invalid.
Both the testing source code (or executable) and the model weights must be submitted. We will run the test code to reproduce the results. Reproducibility is a necessary condition. Training code doesn't necessarily have to be included. The code and the model might be posted on the NTIRE 2024 website.
Factsheet describing the method must be submitted. The factsheet format is provided here. Participants must submit a compiled pdf file and the tex source of the factsheet. Participants must provide enough method details and include an overview method figure. This helps writing the challenge summary report.

EMAIL FORMAT
Please use the following format to submit your final results, fact sheet, code, model (with trained parameters).

to: boosterbenchmark@gmail.com;
cc: your_team_members
title: [NTIRE 2024: HR Depth from Images of Specular and Transparent Surfaces] - [Team_name]
body should include:
1) the challenge name (including track id)
2) team name
3) team leader's name, affiliation, and email address
4) team members' names, affiliations, and email addresses
5) user name on the NTIRE 2024 CodaLab leaderboard (if any)
6) executable/source code attached or download links.
7) fact sheet attached (template available here: https://it.overleaf.com/read/hmskzfgfzzph#e8fee3)
8) download link to the results

IMPORTANT CHALLENGES DATES

2024-01-21: Release of training and validation data;
2024-02-01: Validation server online;
~~2024-03-07~~ 2024-03-15: Final test data release, validation server closed;
~~2024-03-14~~ 2024-03-21: Test result submission deadline;
~~2024-03-15~~ 2024-03-22: Fact sheet / code / model submission deadline;
~~2024-03-17~~ 2024-03-24: Test preliminary score release to the participants;
~~2024-03-28~~ 2024-04-05: Paper submission deadline for entries from the challenges;
2024-04-13 Camera-ready paper submission deadline;

NEWS AND UPDATES

2024-03-15: Test Data Released! Test Phase Begins!
2024-02-28: Extended Workshop and Challenge Deadlines!
2024-01-22: Training and validation data have been released.
2023-12: Workshop proposal has been accepted.