Abstract Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D assets and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image
![DreamFusion: Text-to-3D using 2D Diffusion](https://cdn-ak-scissors.b.st-hatena.com/image/square/6292e454498ca58688624e533500ea857b138fc1/height=288;version=1;width=512/https%3A%2F%2Fdreamfusion3d.github.io%2Fassets%2Fimages%2Fdreamfusion_samples.png)