CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

Wenhang Ge1,3*† Guibao Shen1,3*† Jiawei Feng1,* Luozhou Wang1
Hao Lu1 Xingye Tian3 Xin Tao3 Ying-Cong Chen1,2‡
1HKUST(GZ)   2HKUST   3Kling Team, Kuaishou Technology
* Equal Contribution    Corresponding Author
This work was conducted during the author’s internship at Kling.
CamPilot Teaser
Our model functions as a comprehensive framework for world-consistent video generation and scene reconstruction. In the upper section, it excels at generating 3D-consistent scene videos for world exploration by following custom camera trajectories. In the lower section, it efficiently reconstructs high-quality 3D scenes in a feed-forward manner with generated video frames.
Abstract

We propose CamPilot, a novel framework that achieves precise camera control in video generation. By leveraging a camera-aware 3D decoder based on 3D Gaussian Splatting (3DGS), CamPilot efficiently evaluates geometric consistency and provides robust reward signals for feedback learning. This approach overcomes the computational bottlenecks of traditional methods and enables strict adherence to camera trajectories.

CamPilot Pipeline
Overall of our framework. It consists of (a): a camera-controlled I2V model, where we inject Plücker Embedding as camera condition using ControlNet. (b) A camera-aware 3D decoder that decodes latent to 3DGS, supporting rendering for reward computation. (c) Camera reward optimization that minimizes mask-aware difference between rendered videos and ground-truth ones.

Results on RealEstate10K.
From Left to Right: [ GT Video, Camera Trajectory, Generated Video, Rendered Video ]

Out-of-Distribution (OOD) Results
From Left to Right: [ Input Image, Camera Trajectory, Generated Video, Rendered Video ]

Ablation Studies
From Left to Right: [ Input Image, Generated Video, Rendered Video ]

w/o ReFL (1)
w/ ReFL (1)
w/o ReFL (2)
w/ ReFL (2)

Results with Different Scales
From Left to Right: [ Input Image, Generated Video, Rendered Video ]

Scale 1
Scale 2
Scale 3
Scale 4