This episode explores World-R1, a post-training method for improving 3D consistency in text-to-video generation without redesigning the underlying model architecture. It explains how the approach combines reinforcement learning, pretrained 3D reconstruction critics, vision-language rewards, and camera-motion-focused prompt data to push generated videos toward more stable geometry under viewpoint changes. The discussion highlights why this matters for scene persistence, occlusion, and camera motion, especially if video models are ever to serve as usable world models rather than just visually plausible clip generators. Listeners would find it interesting because it digs into a concrete attempt to make today’s impressive but fragile video systems behave more like coherent simulated worlds. Sources: 1. World-R1: Reinforcing 3D Constraints for Text-to-Video Generation — Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang, 2026 http://arxiv.org/abs/2604.24764 2. Make-A-Video: Text-to-Video Generation without Text-Video Data — Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman, 2022 https://scholar.google.com/scholar?q=Make-A-Video:+Text-to-Video+Generation+without+Text-Video+Data 3. Imagen Video: High Definition Video Generation with Diffusion Models — Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans, 2022 https://scholar.google.com/scholar?q=Imagen+Video:+High+Definition+Video+Generation+with+Diffusion+Models 4. Lumiere: A Space-Time Diffusion Model for Video Generation — Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri, 2024 https://scholar.google.com/scholar?q=Lumiere:+A+Space-Time+Diffusion+Model+for+Video+Generation 5. Video generation models as world simulators — Tim Brooks, Bill Peebles, Conor Holmes, Will DePue, Alex Payne, Robin Rombach, Patrick Esser, Jon Barron, Bhargav Chan, and OpenAI collaborators, 2024 https://scholar.google.com/scholar?q=Video+generation+models+as+world+simulators 6. CameraCtrl: Enabling Camera Control for Text-to-Video Generation — Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang, 2024 https://scholar.google.com/scholar?q=CameraCtrl:+Enabling+Camera+Control+for+Text-to-Video+Generation 7. WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance — Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang, 2025 https://scholar.google.com/scholar?q=WorldForge:+Unlocking+Emergent+3D/4D+Generation+in+Video+Diffusion+Model+via+Training-Free+Guidance 8. FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction — Yixiang Dai, Fan Jiang, Chiyu Wang, Mu Xu, Yonggang Qi, 2025 https://scholar.google.com/scholar?q=FantasyWorld:+Geometry-Consistent+World+Modeling+via+Unified+Video+and+3D+Prediction 9. 3D and 4D World Modeling: A Survey — Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu and many coauthors, 2025 https://scholar.google.com/scholar?q=3D+and+4D+World+Modeling:+A+Survey 10. Wan: Open and Advanced Large-Scale Video Generative Models — WanTeam, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, and many others, 2025 https://scholar.google.com/scholar?q=Wan:+Open+and+Advanced+Large-Scale+Video+Generative+Models 11. Flow-GRPO: Training Flow Matching Models via Online RL — Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang, 2025 https://scholar.google.com/scholar?q=Flow-GRPO:+Training+Flow+Matching+Models+via+Online+RL 12. Depth Anything 3: Recovering the Visual Space from Any Views — Haotong Lin, Sili Chen, Junhao Liew, Donny Y. Chen, Zhenyu Li, Guang Shi, Jiashi Feng, Bingyi Kang, 2025 https://scholar.google.com/scholar?q=Depth+Anything+3:+Recovering+the+Visual+Space+from+Any+Views 13. VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward — Zhaochong An, Orest Kupyn, Theo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla, 2026 https://scholar.google.com/scholar?q=VGGRPO:+Towards+World-Consistent+Video+Generation+with+4D+Latent+Reward 14. WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion — Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang, 2025 https://scholar.google.com/scholar?q=WorldWarp:+Propagating+3D+Geometry+with+Asynchronous+Video+Diffusion 15. Pre-Trained Video Generative Models as World Simulators — Haoran He, Yang Zhang, Liang Lin, Zhongwen Xu, and collaborators, 2025 https://scholar.google.com/scholar?q=Pre-Trained+Video+Generative+Models+as+World+Simulators 16. Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors — authors not visible in the snippet, recent, likely 2025-2026 https://scholar.google.com/scholar?q=Towards+Realistic+and+Consistent+Orbital+Video+Generation+via+3D+Foundation+Priors 17. Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding — authors not visible in the snippet, recent, likely 2025-2026 https://scholar.google.com/scholar?q=Generation+Models+Know+Space:+Unleashing+Implicit+3D+Priors+for+Scene+Understanding 18. Is a Picture Worth a Thousand Words? Delving into Spatial Reasoning for Vision-Language Models — authors not visible in the snippet, recent, likely 2024-2025 https://scholar.google.com/scholar?q=Is+a+Picture+Worth+a+Thousand+Words?+Delving+into+Spatial+Reasoning+for+Vision-Language+Models 19. Fast Multi-View Consistent 3D Editing with Video Priors — authors not visible in the snippet, recent, likely 2024-2025 https://scholar.google.com/scholar?q=Fast+Multi-View+Consistent+3D+Editing+with+Video+Priors 20. AI Post Transformers: LeWorldModel: Stable Joint-Embedding World Models from Pixels — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-leworldmodel-stable-joint-embedding-worl-650f9f.mp3 21. AI Post Transformers: DreamerV3 World Models Across 150 Tasks — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-dreamerv3-world-models-across-150-tasks-af5edb.mp3 Interactive Visualization: World-R1 Improves 3D Consistency in Text-to-Video