ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

ActCam은 사전 학습된 이미지-비디오 확산 모델을 기반으로, 드라이빙 비디오에서 캐릭터 모션을 새로운 장면으로 전이하면서 프레임별 카메라 내·외부 파라미터를 제어하는 제로샷 비디오 생성 기법이다. 깊이와 포즈 조건을 시간에 따라 단계적으로 적용해 장면 구조와 고주파 디테일을 균형 있게 생성하며, 다양한 벤치마크에서 기존 포즈 제어 방식 대비 카메라 일관성과 모션 충실도가 향상됨을 보였다. 별도의 추가 학습 없이도 카메라와 모션을 동시에 정밀하게 제어할 수 있어 영상 생성 및 AI 기반 콘텐츠 제작에 유용하다.

https://arxiv.org/abs/2605.06667

#videogeneration #diffusionmodel #3dmotioncontrol #zeroshot #computervision

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both pose and sparse depth to enforce scene structure, after which depth is dropped and pose-only guidance refines high-frequency details without over-constraining the generation. We evaluate ActCam on multiple benchmarks spanning diverse character motions and challenging viewpoint changes. We find that, compared to pose-only control and other pose and camera methods, ActCam improves camera adherence and motion fidelity, and is preferred in human evaluations, especially under large viewpoint changes. Our results highlight that careful camera-consistent conditioning and staged guidance can enable strong joint camera and motion control without training. Project page: https://elkhomar.github.io/actcam/.

arXiv.org