minus9dのブックマーク / 2024年2月17日

minus9d id:minus9d

2024年2月17日のブックマーク (1件)

Video generation models as world simulators
We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transf ormer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Ou
minus9d 2024/02/17
deep learning
リンク
- 2024年2月24日
- 2024年2月17日
- 2024年2月16日