NeurIPS 2025 Spotlight

VF-Bench

A Unified Solution to Video Fusion:
From Multi-Frame Learning to Benchmarking

1 ETH ZΓΌrich   2 Xi'an Jiaotong University   3 Shanghai Jiao Tong University   4 Nanjing University
UniVF β€” first unified video fusion framework VF-Bench β€” first comprehensive video fusion benchmark Flicker-free & temporally coherent fusion Unified spatial & temporal evaluation protocol New state-of-the-art on VF-Bench
Scroll to explore

Overview of the main contributions of this paper, including the Video Fusion Benchmark (VF-Bench), the Unified Video Fusion framework (UniVF), dedicated training losses and evaluation protocol for video fusion.

Abstract

The real world is dynamic, yet most image fusion methods process static frames independently, ignoring temporal correlations in videos and leading to flickering and temporal inconsistency. To address this, we propose Unified Video Fusion (UniVF), a novel framework for temporally coherent video fusion that leverages multi-frame learning and optical flow-based feature warping for informative, temporally coherent video fusion. To support its development, we also introduce Video Fusion Benchmark (VF-Bench), the first comprehensive benchmark covering four video fusion tasks: multi-exposure, multi-focus, infrared-visible, and medical fusion. VF-Bench provides high-quality, well-aligned video pairs obtained through synthetic data generation and rigorous curation from existing datasets, with a unified evaluation protocol that jointly assesses the spatial quality and temporal consistency of video fusion. Extensive experiments show that UniVF achieves state-of-the-art results across all tasks on VF-Bench.


How UniVF Works

UniVF architecture

Detailed illustration of our UniVF architecture.

How VF-Bench Was Built

VF-Bench data pipeline

The proposed data generation paradigms for (a) multi-exposure video pair and (b) multi-focus video pair for our VF-Bench.

VF-Bench Gallery


Quantitative Comparison

Quantitative evaluation results for the Multi-Exposure Fusion and Multi-Focus Fusion task. The red and blue highlights indicate the highest and second-highest scores.

Comparison with other methods

Quantitative evaluation results for the Infrared-Visible Fusion and Medical Video Fusion task. The red and blue highlights indicate the highest and second-highest scores.

Comparison with other methods

Quantitative evaluation results for the low-resolution Multi-Exposure Fusion (540p) and Multi-Focus Fusion (480p) task. The red and blue highlights indicate the highest and second-highest scores.

Comparison with other methods

Refer to the main paper linked above for more details on qualitative, quantitative, and ablation studies.


Qualitative Video Comparison

Qualitative Image Comparison

Citation

BibTeX
@InProceedings{zhao2025unified,
  title     = {A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking},
  author    = {Zhao, Zixiang and Bai, Haowen and Ke, Bingxin and Cui, Yukun and Deng, Lilun
               and Zhang, Yulun and Zhang, Kai and Schindler, Konrad},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2025}
}