This episode explores the position paper Robots Need More Than VLAs & World Models and its claim that the main bottleneck in robotics may be grounding: turning raw physical behavior into robot-usable signals such as actions, contacts, task phases, goals, and rewards. It explains why vision-language-action models, world models, and reward models play different roles, and why simply scaling policy transformers cannot recover supervision that was never captured in the data. The discussion also digs into cross-embodiment learning and task-preserving retargeting, focusing on how humans and different robots can share useful experience despite mismatched bodies, sensors, and action spaces. A standout example is EgoMimic, which uses egocentric human video, 3D hand tracking, cross-domain alignment, and joint human-robot training to improve long-horizon real-robot manipulation, giving listeners a concrete picture of what might actually unlock broader robot generalization. Sources: 1. Robots Need More than VLA and World Models — Elis Karcini, Faisal Mehrban, Quang Nguyen, Mac Schwager, Arash Ajoudani, Cesar Cadena, Jan Peters, Marco Hutter, Haitham Bou-Ammar, 2026 http://arxiv.org/abs/2606.06556 2. RT-1: Robotics Transformer for Real-World Control at Scale — Anthony Brohan, Noah Brown, Chelsea Finn, Sergey Levine, et al., 2022 https://scholar.google.com/scholar?q=RT-1:+Robotics+Transformer+for+Real-World+Control+at+Scale 3. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control — Anthony Brohan, Noah Brown, Danny Driess, Karol Hausman, Chelsea Finn, Sergey Levine, et al., 2023 https://scholar.google.com/scholar?q=RT-2:+Vision-Language-Action+Models+Transfer+Web+Knowledge+to+Robotic+Control 4. OpenVLA: An Open-Source Vision-Language-Action Model — Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Chelsea Finn, Sergey Levine, et al., 2024 https://scholar.google.com/scholar?q=OpenVLA:+An+Open-Source+Vision-Language-Action+Model 5. π0: A Vision-Language-Action Flow Model for General Robot Control — Kevin Black, Noah Brown, Danny Driess, Karol Hausman, Sergey Levine, Chelsea Finn, et al., 2024 https://scholar.google.com/scholar?q=π0:+A+Vision-Language-Action+Flow+Model+for+General+Robot+Control 6. Deep reinforcement learning from human preferences — Paul Christiano, Jan Leike, Tom B. Brown, Dario Amodei, et al., 2017 https://scholar.google.com/scholar?q=Deep+reinforcement+learning+from+human+preferences 7. Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos — Annie S. Chen, Suraj Nair, Chelsea Finn, 2021 https://scholar.google.com/scholar?q=Learning+Generalizable+Robotic+Reward+Functions+from+"In-The-Wild"+Human+Videos 8. Language to Rewards for Robotic Skill Synthesis — Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Brian Ichter, Ted Xiao, Fei Xia, et al., 2023 https://scholar.google.com/scholar?q=Language+to+Rewards+for+Robotic+Skill+Synthesis 9. RoboReward: General-Purpose Vision-Language Reward Models for Robotics — Tony Lee, Andrew Wagenmaker, Karl Pertsch, Percy Liang, Sergey Levine, Chelsea Finn, 2026 https://scholar.google.com/scholar?q=RoboReward:+General-Purpose+Vision-Language+Reward+Models+for+Robotics 10. Open X-Embodiment: Robotic Learning Datasets and RT-X Models — Open X-Embodiment Collaboration; Abby O'Neill, Fei Xia, Chelsea Finn, Sergey Levine, et al., 2023 https://scholar.google.com/scholar?q=Open+X-Embodiment:+Robotic+Learning+Datasets+and+RT-X+Models 11. XSkill: Cross Embodiment Skill Discovery — Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song, 2023 https://scholar.google.com/scholar?q=XSkill:+Cross+Embodiment+Skill+Discovery 12. Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment — Tianyu Wang, Dwait Bhatt, Xiaolong Wang, Nikolay Atanasov, 2024 https://scholar.google.com/scholar?q=Cross-Embodiment+Robot+Manipulation+Skill+Transfer+using+Latent+Space+Alignment 13. Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer — Gemini Robotics Team; Abbas Abdolmaleki, Anthony Brohan, Keerthana Gopalakrishnan, Ted Xiao, et al., 2025 https://scholar.google.com/scholar?q=Gemini+Robotics+1.5:+Pushing+the+Frontier+of+Generalist+Robots+with+Advanced+Embodied+Reasoning,+Thinking,+and+Motion+Transfer 14. Learning Latent Plans from Play — Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Sergey Levine, Pierre Sermanet, et al., 2019 https://scholar.google.com/scholar?q=Learning+Latent+Plans+from+Play 15. R3M: A Universal Visual Representation for Robot Manipulation — Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta, 2022 https://scholar.google.com/scholar?q=R3M:+A+Universal+Visual+Representation+for+Robot+Manipulation 16. Zero-Shot Robot Manipulation from Passive Human Videos — Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar, 2023 https://scholar.google.com/scholar?q=Zero-Shot+Robot+Manipulation+from+Passive+Human+Videos 17. GenSim: Generating Robotic Simulation Tasks via Large Language Models — Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Huazhe Xu, Xiaolong Wang, et al., 2023 https://scholar.google.com/scholar?q=GenSim:+Generating+Robotic+Simulation+Tasks+via+Large+Language+Models 18. $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control — Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, et al., 2024 https://scholar.google.com/scholar?q=$\pi_0$:+A+Vision-Language-Action+Flow+Model+for+General+Robot+Control 19. EgoMimic: Scaling Imitation Learning via Egocentric Video — Simar Kareer, Dhruv Patel, Ryan Punamiya, Pranay Mathur, Shuo Cheng, Chen Wang, Judy Hoffman, Danfei Xu, 2024 https://scholar.google.com/scholar?q=EgoMimic:+Scaling+Imitation+Learning+via+Egocentric+Video 20. LEGATO: Cross-Embodiment Imitation Using a Grasping Tool — Mingyo Seo, H. Andy Park, Shenli Yuan, Yuke Zhu, Luis Sentis, 2024 https://scholar.google.com/scholar?q=LEGATO:+Cross-Embodiment+Imitation+Using+a+Grasping+Tool 21. Rank2Reward: Learning Shaped Reward Functions from Passive Video — Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta, 2024 https://scholar.google.com/scholar?q=Rank2Reward:+Learning+Shaped+Reward+Functions+from+Passive+Video 22. Genie: Generative Interactive Environments — Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, et al., 2024 https://scholar.google.com/scholar?q=Genie:+Generative+Interactive+Environments 23. Neural Scaling Laws in Robotics — Sebastian Sartor, Neil Thompson, 2024 https://arxiv.org/abs/2405.14005 24. Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression — Junjie Wen et al., 2024 https://arxiv.org/abs/2412.03293 25. Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation — Ria Doshi et al., 2024 https://arxiv.org/abs/2408.11812 26. RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning — Lawrence Yunliang Chen et al., 2024 https://arxiv.org/abs/2409.03403 27. CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations — Anthony Liang et al., 2025 https://arxiv.org/abs/2505.04999 28. DayDreamer: World Models for Physical Robot Learning — Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel, 2022 https://arxiv.org/abs/2206.14176 29. Ctrl-World: A Controllable Generative World Model for Robot Manipulation — Yanjiang Guo et al., 2025 https://arxiv.org/abs/2510.10125 30. AI Post Transformers: DreamerV3 World Models Across 150 Tasks — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-dreamerv3-world-models-across-150-tasks-af5edb.mp3 31. AI Post Transformers: When LLM Judges Become Coin Flips — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-when-llm-judges-become-coin-flips-8b43ef.mp3 32. AI Post Transformers: SkillsBench for Evaluating Agent Skills — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-skillsbench-for-evaluating-agent-skills-58bb1e.mp3 Interactive Visualization: Robots Need More Than VLAs and World Models