Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations

Abstract

While current multi-frame restoration methods combine information from multiple input images using 2D alignment techniques, recent advances in novel view synthesis are paving the way for a new paradigm relying on volumetric scene representations. In this work, we introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements. Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane representations in feature space. The encoder fuses information across views and operates in a depth-wise manner while the renderer fuses information across depths and operates in a view-wise manner. The two modules are trained end-to-end and learn to separate depths in an unsupervised way, giving rise to Multiplane Feature (MPF) representations. Experiments on the Spaces and Real Forward-Facing datasets as well as on raw burst data validate our approach for view synthesis, multi-frame denoising, and view synthesis under noisy conditions.

Method

MPFER. Input views are forward-warped into plane sweep volumes (PSVs) which are processed depthwise by the Encoder Unet64. The resulting multiplane feature representation (MPF) can then be back-projected to an arbitrary number of novel views, or to the same views as the inputs—allowing the integration of a skip connection (illustrated here). The Renderer Unet64 processes the projected MPFs on a per-view basis, producing the final synthesised or denoised outputs.

Results

We evaluate our model in 4 scenarios: (1) novel view synthesis on the Spaces dataset, (2) multi-frame denoising on the Spaces dataset, (3) multi-frame denoising on the Real Forward-Facing dataset and (4) novel view synthesis under noisy conditions on the Real Forward-Facing dataset (see the paper for details). Qualitative comparisons with baseline methods are shown below.

Synthesis on Spaces (Large baseline)

Scene_000

Scene_024

Scene_052

Scene_063

Scene_073

Soft3D*

DeepView*

MPINet

MPINet-dw

MPINet-dw-it

MPFER-64

Ground Truth

Denoising on Spaces (Gain 20)

Scene_000

Scene_024

Scene_052

Scene_063

Scene_073

Noisy

VBM4D

BPN

BasicVSR++

DeepRep

MPFER-64

Ground Truth

Denoising on Real Forward-Facing scenes (Gain 20)

Fern

Flower

Fortress

Horns

Leaves

Orchids

Room

T-rex

Noisy

IBRNet-N*

NAN*

MPFER-N

MPFER-C

Ground Truth

Synthesis under noisy conditions on Real Forward-Facing scenes (Gain 20)

Fern

Flower

Fortress

Horns

Leaves

Orchids

Room

T-rex

IBRNet*

IBRNet-N*

NAN*

MPFER

Ground Truth

BibTeX

@article{tanay2023efficient,
  author    = {Tanay, Thomas and Leonardis, Ales and Maggioni, Matteo},
  title     = {Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations},
  journal   = {CVPR},
  year      = {2023},
}

Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations

CVPR 2023

Our Multiplane Features Encoder-Renderer (MPFER) reimagines the MPI pipeline by moving the multiplane representation to feature space.
Encoding a scene from 4 input views at 500x800 resolution takes about 1.2s on a V100 GPU. Rendering a novel view then takes about 0.01s.

Abstract

Method

Results

Synthesis on Spaces (Large baseline)

Denoising on Spaces (Gain 20)

Denoising on Real Forward-Facing scenes (Gain 20)

Synthesis under noisy conditions on Real Forward-Facing scenes (Gain 20)

BibTeX

Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations

CVPR 2023

Our Multiplane Features Encoder-Renderer (MPFER) reimagines the MPI pipeline by moving the multiplane representation to feature space. Encoding a scene from 4 input views at 500x800 resolution takes about 1.2s on a V100 GPU. Rendering a novel view then takes about 0.01s.

Abstract

Method

Results

Synthesis on Spaces (Large baseline)

Denoising on Spaces (Gain 20)

Denoising on Real Forward-Facing scenes (Gain 20)

Synthesis under noisy conditions on Real Forward-Facing scenes (Gain 20)

BibTeX

Our Multiplane Features Encoder-Renderer (MPFER) reimagines the MPI pipeline by moving the multiplane representation to feature space.
Encoding a scene from 4 input views at 500x800 resolution takes about 1.2s on a V100 GPU. Rendering a novel view then takes about 0.01s.