#Bolt3D claims to revolutionize 3D scene generation by directly creating renderable 3D representations from one or more images. It achieves unprecedented speed and quality without requiring computationally expensive optimization or augmentation steps.
https://arxiv.org/abs/2503.14445v1
#ComputerVision #VirtualReality #3DModeling #GoogleResearch #LatentDiffusion #FeedForwardModels
Bolt3D: Generating 3D Scenes in Seconds
We present a latent diffusion model for fast feed-forward 3D scene generation. Given one or more images, our model Bolt3D directly samples a 3D scene representation in less than seven seconds on a single GPU. We achieve this by leveraging powerful and scalable existing 2D diffusion network architectures to produce consistent high-fidelity 3D scene representations. To train this model, we create a large-scale multiview-consistent dataset of 3D geometry and appearance by applying state-of-the-art dense 3D reconstruction techniques to existing multiview image datasets. Compared to prior multiview generative models that require per-scene optimization for 3D reconstruction, Bolt3D reduces the inference cost by a factor of up to 300 times.