AI Post Transformers

mcgrof

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

  1. -1 J

    Breaking the Prefix Barrier with Shared KV Cache

    This episode explores a 2026 paper proposing that multiple LLM agents should share transformer KV-cache state, not just text, so they can avoid repeatedly paying the prefill cost of rereading the same plans, critiques, and intermediate outputs. It explains the systems background behind prefix caching, vLLM’s PagedAttention, and SGLang, then focuses on why multi-agent workflows break the exact-prefix assumption and make segment-level reuse much harder. The discussion highlights the paper’s core technical tension: the idea is compelling, but reusing cached activations across different prompt positions is fragile because of positional encoding effects such as RoPE misalignment and attention behavior. Listeners would find it interesting because it connects a practical bottleneck in agent systems to deep transformer internals, while also questioning whether the paper truly delivers fine-grained semantic sharing or a narrower form of reusable output caching. Sources: 1. Breaking the Prefix Barrier with Shared KV Cache https://openreview.net/forum?id=kgzBkyqg6Z 2. Efficient Memory Management for Large Language Model Serving with PagedAttention — Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, 2023 https://scholar.google.com/scholar?q=Efficient+Memory+Management+for+Large+Language+Model+Serving+with+PagedAttention 3. SGLang: Efficient Execution of Structured Language Model Programs — Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 2024 https://scholar.google.com/scholar?q=SGLang:+Efficient+Execution+of+Structured+Language+Model+Programs 4. CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion — Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 2025 https://scholar.google.com/scholar?q=CacheBlend:+Fast+Large+Language+Model+Serving+for+RAG+with+Cached+Knowledge+Fusion 5. KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems — Hancheng Ye, Zhengqi Gao, Mingyuan Ma, Qinsi Wang, Yuzhe Fu, Ming-Yu Chung, Yueqian Lin, Zhijian Liu, Jianyi Zhang, Danyang Zhuo, Yiran Chen, 2025 https://scholar.google.com/scholar?q=KVCOMM:+Online+Cross-context+KV-cache+Communication+for+Efficient+LLM-based+Multi-agent+Systems 6. EPIC: Efficient Position-Independent Caching for Serving Large Language Models — Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie, 2025 https://scholar.google.com/scholar?q=EPIC:+Efficient+Position-Independent+Caching+for+Serving+Large+Language+Models 7. KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows — Zaifeng Pan, Ajjkumar Patel, Zhengding Hu, Yipeng Shen, Yue Guan, Wan-Lu Li, Lianhui Qin, Yida Wang, Yufei Ding, 2025 https://scholar.google.com/scholar?q=KVFlow:+Efficient+Prefix+Caching+for+Accelerating+LLM-Based+Multi-Agent+Workflows 8. DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving — Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse, 2024 https://scholar.google.com/scholar?q=DroidSpeak:+KV+Cache+Sharing+for+Cross-LLM+Communication+and+Multi-LLM+Serving 9. TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing — Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, Youwei Zhuo, 2026 https://scholar.google.com/scholar?q=TokenDance:+Scaling+Multi-Agent+LLM+Serving+via+Collective+KV+Cache+Sharing 10. HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse — authors unclear from Scholar snippet, 2025 https://scholar.google.com/scholar?q=HyperRAG:+Enhancing+Quality-Efficiency+Tradeoffs+in+Retrieval-Augmented+Generation+with+Reranker+KV-Cache+Reuse 11. ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation — authors unclear from Scholar snippet, 2025 https://scholar.google.com/scholar?q=ProphetKV:+User-Query-Driven+Selective+Recomputation+for+Efficient+KV+Cache+Reuse+in+Retrieval-Augmented+Generation 12. Cache-craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation — authors unclear from Scholar snippet, 2025 https://scholar.google.com/scholar?q=Cache-craft:+Managing+Chunk-Caches+for+Efficient+Retrieval-Augmented+Generation 13. AttentionStore: Cost-Effective Attention Reuse Across Multi-Turn Conversations in Large Language Model Serving — authors unclear from Scholar snippet, 2025 https://scholar.google.com/scholar?q=AttentionStore:+Cost-Effective+Attention+Reuse+Across+Multi-Turn+Conversations+in+Large+Language+Model+Serving 14. BAT: Efficient Generative Recommender Serving with Bipartite Attention — authors unclear from Scholar snippet, 2025 https://scholar.google.com/scholar?q=BAT:+Efficient+Generative+Recommender+Serving+with+Bipartite+Attention 15. AI Post Transformers: Efficient KV Cache Sharing for Multi-LoRA Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-efficient-kv-cache-sharing-for-multi-lor-afda05.mp3 16. AI Post Transformers: TokenDance for Multi-Agent KV Cache Sharing — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-tokendance-for-multi-agent-kv-cache-shar-aa9b99.mp3 17. AI Post Transformers: KV Cache TTL for Multi-Turn Agent Scheduling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-kv-cache-ttl-for-multi-turn-agent-schedu-996bf1.mp3 18. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 19. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 Interactive Visualization: Breaking the Prefix Barrier with Shared KV Cache

  2. -1 J

    Reverse-Mode Differentiation Across AD and Neural Nets

    This episode explores Paul Werbos’s 2004 review of reverse differentiation and argues that reverse-mode automatic differentiation, backpropagation, hand-coded adjoints, and adjoint circuits are largely the same core idea expressed in different technical communities. It explains the mechanics of automatic differentiation and reverse mode in clear terms, then traces how these methods diverged historically and why that fragmentation slowed progress in fields like neural networks, control, and scientific computing. The discussion highlights Werbos’s main claim that better integrated, derivative-aware software could make advanced nonlinear modeling and intelligent control far more practical, while also questioning how much evidence supports that agenda beyond synthesis and historical interpretation. Listeners would find it interesting for its sharp distinction between gradients as infrastructure versus models or optimizers, and for its perspective on how today’s differentiable programming ecosystem was once a contested software vision. Sources: 1. Reverse-Mode Differentiation Across AD and Neural Nets https://www.werbos.com/AD2004.pdf 2. A Simple Automatic Derivative Evaluation Program — R. E. Wengert, 1964 https://scholar.google.com/scholar?q=A+Simple+Automatic+Derivative+Evaluation+Program 3. Taylor Expansion of the Accumulated Rounding Error — Seppo Linnainmaa, 1976 https://scholar.google.com/scholar?q=Taylor+Expansion+of+the+Accumulated+Rounding+Error 4. The Complexity of Partial Derivatives — Walter Baur and Volker Strassen, 1983 https://scholar.google.com/scholar?q=The+Complexity+of+Partial+Derivatives 5. Automatic Differentiation in Machine Learning: a Survey — Atılım Güneş Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind, 2018 https://scholar.google.com/scholar?q=Automatic+Differentiation+in+Machine+Learning:+a+Survey 6. Learning Representations by Back-Propagating Errors — David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, 1986 https://scholar.google.com/scholar?q=Learning+Representations+by+Back-Propagating+Errors 7. Backpropagation Through Time: What It Does and How to Do It — Paul J. Werbos, 1990 https://scholar.google.com/scholar?q=Backpropagation+Through+Time:+What+It+Does+and+How+to+Do+It 8. Backpropagation Applied to Handwritten Zip Code Recognition — Yann LeCun, Bernhard Boser, John S. Denker, Don Henderson, Richard E. Howard, Wayne Hubbard, Lawrence D. Jackel, 1989 https://scholar.google.com/scholar?q=Backpropagation+Applied+to+Handwritten+Zip+Code+Recognition 9. Gradient-Based Learning Applied to Document Recognition — Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, 1998 https://scholar.google.com/scholar?q=Gradient-Based+Learning+Applied+to+Document+Recognition 10. Neuro-Dynamic Programming: An Overview — Dimitri P. Bertsekas, John N. Tsitsiklis, 1995 https://scholar.google.com/scholar?q=Neuro-Dynamic+Programming:+An+Overview 11. Neuro-Dynamic Programming — Dimitri P. Bertsekas, John N. Tsitsiklis, 1996 https://scholar.google.com/scholar?q=Neuro-Dynamic+Programming 12. Approximate Dynamic Programming and Reinforcement Learning — Lucian Bușoniu, Bart De Schutter, Robert Babuška, 2010 https://scholar.google.com/scholar?q=Approximate+Dynamic+Programming+and+Reinforcement+Learning 13. An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application — Hugo P. Simão, Jeff Day, Abraham P. George, Ted Gifford, John Nienow, Warren B. Powell, 2009 https://scholar.google.com/scholar?q=An+Approximate+Dynamic+Programming+Algorithm+for+Large-Scale+Fleet+Management:+A+Case+Application 14. Some New Tools for Prediction and Analysis in the Behavioral Sciences — Paul J. Werbos, 1974 https://scholar.google.com/scholar?q=Some+New+Tools+for+Prediction+and+Analysis+in+the+Behavioral+Sciences 15. The Difficulty of Learning Long-Term Dependencies with Gradient Descent is Officially Overcome — Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, 2001 https://scholar.google.com/scholar?q=The+Difficulty+of+Learning+Long-Term+Dependencies+with+Gradient+Descent+is+Officially+Overcome 16. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation — Andreas Griewank, 2000 https://scholar.google.com/scholar?q=Evaluating+Derivatives:+Principles+and+Techniques+of+Algorithmic+Differentiation 17. Differential Dynamic Programming — David H. Jacobson, David Q. Mayne, 1970 https://scholar.google.com/scholar?q=Differential+Dynamic+Programming 18. Backpropagation-free training of deep physical neural networks — Ali Momeni, Babak Rahmani, Matthieu Mallejac, Philipp Del Hougne, Romain Fleury, 2023 https://scholar.google.com/scholar?q=Backpropagation-free+training+of+deep+physical+neural+networks 19. Fully forward mode training for optical neural networks — Zhiwei Xue, Tiankuang Zhou, Zhihao Xu, Shaoliang Yu, Qionghai Dai, Lu Fang, 2024 https://scholar.google.com/scholar?q=Fully+forward+mode+training+for+optical+neural+networks 20. Brain-like training of a pre-sensor optical neural network with a backpropagation-free algorithm — Zheng Huang, Conghe Wang, Caihua Zhang, Wanxin Shi, Shukai Wu, Sigang Yang, Hongwei Chen, 2025 https://scholar.google.com/scholar?q=Brain-like+training+of+a+pre-sensor+optical+neural+network+with+a+backpropagation-free+algorithm 21. Backpropagation-Free Deep Learning with Recursive Local Representation Alignment — Alexander G. Ororbia, Ankur Mali, Daniel Kifer, C. Lee Giles, 2023 https://scholar.google.com/scholar?q=Backpropagation-Free+Deep+Learning+with+Recursive+Local+Representation+Alignment 22. Exploring the Promise and Limits of Real-Time Recurrent Learning — Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber, 2023 https://scholar.google.com/scholar?q=Exploring+the+Promise+and+Limits+of+Real-Time+Recurrent+Learning 23. Real-Time Recurrent Reinforcement Learning — Julian Lemmel, Radu Grosu, 2023/2025 https://scholar.google.com/scholar?q=Real-Time+Recurrent+Reinforcement+Learning 24. Second-order forward-mode optimization of recurrent neural networks for neuroscience — Youjing Yu, Rui Xia, Qingxi Ma, Máté Lengyel, Guillaume Hennequin, 2024 https://scholar.google.com/scholar?q=Second-order+forward-mode+optimization+of+recurrent+neural+networks+for+neuroscience 25. Dynamic predictive coding: A model of hierarchical sequence learning and prediction in the neocortex — Linxing Preston Jiang, Rajesh P. N. Rao, 2024 https://scholar.google.com/scholar?q=Dynamic+predictive+coding:+A+model+of+hierarchical+sequence+learning+and+prediction+in+the+neocortex 26. Predictive coding networks for temporal prediction — Beren Millidge, Mufeng Tang, Mahyar Osanlouy, Nicol S. Harper, Rafal Bogacz, 2024 https://scholar.google.com/scholar?q=Predictive+coding+networks+for+temporal+prediction 27. Where is the error? Hierarchical predictive coding through dendritic error computation — Fabian A. Mikulasch, Lucas Rudelt, Michael Wibral, Viola Priesemann, 2023 https://scholar.google.com/scholar?q=Where+is+the+error?+Hierarchical+predictive+coding+through+dendritic+error+computation 28. AI Post Transformers: Long Short-Term Memory and Vanishing Gradients — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-long-short-term-memory-and-vanishing-gra-72448c.mp3 29. AI Post Transformers: When Spectral Gradient Updates Help Deep Learning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-when-spectral-gradient-updates-help-deep-9c8441.mp3 30. AI Post Transformers: ASI-Evolve for Data, Architectures, and RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-05-asi-evolve-for-data-architectures-and-rl-197b2b.mp3 Interactive Visualization: Reverse-Mode Differentiation Across AD and Neural Nets

  3. -1 J

    Scaling Test-Time Compute for Reasoning Models

    This episode explores how test-time compute should be allocated in large language models, using a recent study that compares parallel sampling, majority voting, shortest- and longest-trace selection, and beam-style search under a common evaluation setup. It explains the paper’s central argument that there is no single best inference-time strategy: some model families behave like short-horizon reasoners that benefit from several concise attempts, while others act like long-horizon reasoners that can make productive use of longer sequential reasoning. The discussion also examines how the authors benchmark eight open models across demanding datasets such as AIME and GPQA Diamond, and why harder problems reveal whether extra trace length produces real progress or just more verbose failure. Listeners would find it interesting because it turns a vague idea of “letting models think longer” into a concrete engineering question about how reasoning systems should spend their runtime budget. Sources: 1. The Art of Scaling Test-Time Compute for Large Language Models — Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 2025 http://arxiv.org/abs/2512.02008 2. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 2024 https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally+can+be+More+Effective+than+Scaling+Model+Parameters 3. Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning — Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz, 2025 https://scholar.google.com/scholar?q=Don't+Overthink+it.+Preferring+Shorter+Thinking+Chains+for+Improved+LLM+Reasoning 4. Inverse Scaling in Test-Time Compute — Aryo Pradipta Gema, Alexander Hagele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez, 2025 https://scholar.google.com/scholar?q=Inverse+Scaling+in+Test-Time+Compute 5. Qwen3 Technical Report — An Yang and the Qwen team, 2025 https://scholar.google.com/scholar?q=Qwen3+Technical+Report 6. Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang et al., 2023 https://scholar.google.com/scholar?q=Self-Consistency+Improves+Chain+of+Thought+Reasoning+in+Language+Models 7. Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Shunyu Yao et al., 2023 https://scholar.google.com/scholar?q=Tree+of+Thoughts:+Deliberate+Problem+Solving+with+Large+Language+Models 8. Graph of Thoughts: Solving Elaborate Problems with Large Language Models — Maciej Besta et al., 2024 https://scholar.google.com/scholar?q=Graph+of+Thoughts:+Solving+Elaborate+Problems+with+Large+Language+Models 9. short-m@k — Ranit Hassid et al., 2025 https://scholar.google.com/scholar?q=short-m@k 10. Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning — approx. recent process-verifier work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=Rewarding+Progress:+Scaling+Automated+Process+Verifiers+for+LLM+Reasoning 11. Improving LLM Reasoning Through Scaling Inference Computation With Collaborative Verification — approx. recent collaborative-verification work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=Improving+LLM+Reasoning+Through+Scaling+Inference+Computation+With+Collaborative+Verification 12. Graph of Verification: Structured Verification of LLM Reasoning With Directed Acyclic Graphs — approx. recent verification-structure work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=Graph+of+Verification:+Structured+Verification+of+LLM+Reasoning+With+Directed+Acyclic+Graphs 13. Dynamic Parallel Tree Search for Efficient LLM Reasoning — approx. recent tree-search work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=Dynamic+Parallel+Tree+Search+for+Efficient+LLM+Reasoning 14. Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls — approx. recent tree-search analysis work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=Don't+Get+Lost+in+the+Trees:+Streamlining+LLM+Reasoning+by+Overcoming+Tree+Search+Exploration+Pitfalls 15. REST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search — approx. recent MCTS/process-reward work, exact authors not confirmed from snippet, 2025 https://scholar.google.com/scholar?q=REST-MCTS*:+LLM+Self-Training+via+Process+Reward+Guided+Tree+Search 16. Large Language Models Cannot Self-Correct Reasoning Yet — approx. recent self-correction evaluation work, exact authors not confirmed from snippet, 2024 https://scholar.google.com/scholar?q=Large+Language+Models+Cannot+Self-Correct+Reasoning+Yet 17. AI Post Transformers: Test-Time Scaling — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/test-time-scaling/ 18. AI Post Transformers: Benchmarking Test-Time Scaling for General LLM Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-benchmarking-test-time-scaling-for-gener-8f14f9.mp3 19. AI Post Transformers: Agentic Aggregation for Long-Horizon AI Tasks — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-agentic-aggregation-for-long-horizon-ai-4c1a71.mp3 20. AI Post Transformers: TUMIX Multi-Agent Test-Time Scaling with Tools — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-tumix-multi-agent-test-time-scaling-with-40671c.mp3 21. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 22. AI Post Transformers: The Art of Scaling Reinforcement Learning Compute for LLMs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/the-art-of-scaling-reinforcement-learning-compute-for-llms/

  4. -2 J

    Stabilizing Efficient Reasoning with Step-Level Advantage Selection

    This episode explores a 2026 paper on making reasoning models faster and cheaper by shortening their chain-of-thought without sacrificing too much accuracy. It explains the paper’s core claim that short-context RL post-training can itself push models toward concise reasoning, while also unpacking the instability this creates and how the proposed Step-Level Advantage Selection method is meant to stabilize training. The discussion places that idea in the broader arc from chain-of-thought prompting and self-consistency to today’s industry-facing reasoning-budget controls, framing efficient reasoning as a test-time compute management problem rather than a new model architecture. Listeners would find it interesting for its skeptical, engineering-focused look at whether shorter reasoning traces are a real advance or just a fragile optimization hidden behind benchmark gains. Sources: 1. Stabilizing Efficient Reasoning with Step-Level Advantage Selection — Han Wang, Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu, 2026 http://arxiv.org/abs/2604.24003 2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, 2022 https://scholar.google.com/scholar?q=Chain-of-Thought+Prompting+Elicits+Reasoning+in+Large+Language+Models 3. Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou, 2022 https://scholar.google.com/scholar?q=Self-Consistency+Improves+Chain+of+Thought+Reasoning+in+Language+Models 4. Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs — Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam, 2023 https://scholar.google.com/scholar?q=Let’s+Sample+Step+by+Step:+Adaptive-Consistency+for+Efficient+Reasoning+and+Coding+with+LLMs 5. Training Language Models to Reason Efficiently — Daman Arora, Andrea Zanette, 2025 https://scholar.google.com/scholar?q=Training+Language+Models+to+Reason+Efficiently 6. DeepScaleR: Effective RL Scaling of Reasoning Models via Iterative Context Lengthening — Michael Luo, Sijun Tan, Justin Wong, Xiaoxiang Shi, William Y. Tang, Manan Roongta, Colin Cai, Jeffrey Luo, Li Erran Li, Raluca Ada Popa, Ion Stoica, 2025 https://scholar.google.com/scholar?q=DeepScaleR:+Effective+RL+Scaling+of+Reasoning+Models+via+Iterative+Context+Lengthening 7. L1: Controlling How Long a Reasoning Model Thinks with Reinforcement Learning — Pranjal Aggarwal, Sean Welleck, 2025 https://scholar.google.com/scholar?q=L1:+Controlling+How+Long+a+Reasoning+Model+Thinks+with+Reinforcement+Learning 8. ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning — Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, 2025 https://scholar.google.com/scholar?q=ThinkPrune:+Pruning+Long+Chain-of-Thought+of+LLMs+via+Reinforcement+Learning 9. LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization — Xingyu Wu, Yuchen Yan, Shangke Lyu, Linjuan Wu, Yiwen Qiu, Yongliang Shen, Weiming Lu, Jian Shao, Jun Xiao, Yueting Zhuang, 2025 https://scholar.google.com/scholar?q=LAPO:+Internalizing+Reasoning+Efficiency+via+Length-Adaptive+Policy+Optimization 10. Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning — Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xiong-Hui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin, 2025 https://scholar.google.com/scholar?q=Beyond+the+80/20+Rule:+High-Entropy+Minority+Tokens+Drive+Effective+Reinforcement+Learning+for+LLM+Reasoning 11. Do NOT Think That Much for 2+3=? On the Overthinking of Long Reasoning Models — Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 2025 https://scholar.google.com/scholar?q=Do+NOT+Think+That+Much+for+2+3=?+On+the+Overthinking+of+Long+Reasoning+Models 12. QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning — Fanqi Wan et al., 2025 https://scholar.google.com/scholar?q=QwenLong-L1:+Towards+Long-Context+Large+Reasoning+Models+with+Reinforcement+Learning 13. LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts — Siyuan Wang et al., 2025 https://scholar.google.com/scholar?q=LoongRL:+Reinforcement+Learning+for+Advanced+Reasoning+over+Long+Contexts 14. LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards — Bowen Ping et al., 2026 https://scholar.google.com/scholar?q=LongR:+Unleashing+Long-Context+Reasoning+via+Reinforcement+Learning+with+Dense+Utility+Rewards 15. SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models — Yugu Li, Zehong Cao, Jianglin Qiao, Siyi Hu, 2026 https://scholar.google.com/scholar?q=SSVPO:+Effective+Step-Level+Credit+Assignment+for+RL+Training+of+Language+Models 16. GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning — Yao Zhang et al., 2025 https://scholar.google.com/scholar?q=GroundedPRM:+Tree-Guided+and+Fidelity-Aware+Process+Reward+Modeling+for+Step-Level+Reasoning 17. Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning — Miles Turpin, Andy Arditi, Marvin Li, Joe Benton, Julian Michael, 2025 https://scholar.google.com/scholar?q=Teaching+Models+to+Verbalize+Reward+Hacking+in+Chain-of-Thought+Reasoning 18. Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort — Xinpeng Wang, Nitish Joshi, Barbara Plank, Rico Angell, He He, 2025 https://scholar.google.com/scholar?q=Is+It+Thinking+or+Cheating?+Detecting+Implicit+Reward+Hacking+by+Measuring+Reasoning+Effort 19. AI Post Transformers: DeepSeek-V4 and Practical Million-Token Context — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-deepseek-v4-and-practical-million-token-6f4de1.mp3 20. AI Post Transformers: AgenticQwen and Small Industrial Tool Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-27-agenticqwen-and-small-industrial-tool-ag-dc676d.mp3 21. AI Post Transformers: World-R1 Improves 3D Consistency in Text-to-Video — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-28-world-r1-improves-3d-consistency-in-text-f065d9.mp3 22. AI Post Transformers: Experience-Based Learning Beyond Human Data — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-experience-based-learning-beyond-human-d-b0caa4.mp3 Interactive Visualization: Stabilizing Efficient Reasoning with Step-Level Advantage Selection

  5. -2 J

    Stochastic KV Routing for Cache Sharing

    This episode explores Apple’s Stochastic KV Routing approach for reducing transformer inference costs by letting some layers reuse key-value caches from earlier layers instead of storing separate caches at every depth. It explains why KV cache memory becomes a major bottleneck for long-context autoregressive decoding, and why depth-wise cache sharing is a different idea from token eviction or temporal compression. The discussion connects the paper to Grouped Query Attention and other prior cache-sharing methods, highlighting Apple’s main argument: train models with stochastic cross-layer routing so they can tolerate many cache-retention layouts and then expose a practical serving-time knob for trading memory use against model quality. A listener would find it interesting because it ties a concrete systems problem in LLM deployment to a training strategy that could make large models more flexible under real hardware constraints. Sources: 1. Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing — Anastasiia Filippova, David Grangier, Marco Cuturi, João Monteiro, 2026 http://arxiv.org/abs/2604.22782 2. Fast Transformer Decoding: One Write-Head is All You Need — Noam Shazeer, 2019 https://scholar.google.com/scholar?q=Fast+Transformer+Decoding:+One+Write-Head+is+All+You+Need 3. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints — Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, Sumit Sanghai, 2023 https://scholar.google.com/scholar?q=GQA:+Training+Generalized+Multi-Query+Transformer+Models+from+Multi-Head+Checkpoints 4. Layer-Condensed KV Cache for Efficient Inference of Large Language Models — Haoyi Wu, Kewei Tu, 2024 https://scholar.google.com/scholar?q=Layer-Condensed+KV+Cache+for+Efficient+Inference+of+Large+Language+Models 5. MiniCache: KV Cache Compression in Depth Dimension for Large Language Models — Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang, 2024 https://scholar.google.com/scholar?q=MiniCache:+KV+Cache+Compression+in+Depth+Dimension+for+Large+Language+Models 6. Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing — Anastasiia Filippova, David Grangier, Marco Cuturi, Joao Monteiro, 2026 https://scholar.google.com/scholar?q=Stochastic+KV+Routing:+Enabling+Adaptive+Depth-Wise+Cache+Sharing 7. Reducing Transformer Key-Value Cache Size with Cross-Layer Attention — William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan-Kelley, 2024 https://scholar.google.com/scholar?q=Reducing+Transformer+Key-Value+Cache+Size+with+Cross-Layer+Attention 8. XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference — Joao Monteiro, Etienne Marcotte, Pierre-Andre Noel, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian, 2024 https://scholar.google.com/scholar?q=XC-Cache:+Cross-Attending+to+Cached+Context+for+Efficient+LLM+Inference 9. KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing — not identifiable from snippet, recent https://scholar.google.com/scholar?q=KVSharer:+Efficient+Inference+via+Layer-Wise+Dissimilar+KV+Cache+Sharing 10. Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity — not identifiable from snippet, recent https://scholar.google.com/scholar?q=Compressing+KV+Cache+for+Long-Context+LLM+Inference+with+Inter-Layer+Attention+Similarity 11. CaR: An Efficient KV Cache Reuse System for Large Language Model Inference — not identifiable from snippet, recent https://scholar.google.com/scholar?q=CaR:+An+Efficient+KV+Cache+Reuse+System+for+Large+Language+Model+Inference 12. Compute or Load KV Cache? Why Not Both? — not identifiable from snippet, recent https://scholar.google.com/scholar?q=Compute+or+Load+KV+Cache?+Why+Not+Both? 13. AI Post Transformers: Lookahead Q-Cache for Consistent KV Eviction — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-lookahead-q-cache-for-consistent-kv-evic-d97b09.mp3 14. AI Post Transformers: Prefill-as-a-Service for Cross-Datacenter KV Cache — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-prefill-as-a-service-for-cross-datacente-7560be.mp3 15. AI Post Transformers: Splitwise: Phase-Split LLM Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-26-splitwise-phase-split-llm-inference-e8945b.mp3 16. AI Post Transformers: FengHuang for Rack-Scale LLM Inference Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-12-fenghuang-for-rack-scale-llm-inference-m-62708e.mp3 17. AI Post Transformers: Mamba-3 for Efficient Sequence Modeling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-16-mamba-3-for-efficient-sequence-modeling-97a22a.mp3 18. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 Interactive Visualization: Stochastic KV Routing for Cache Sharing

  6. -3 J

    Deep Learning in Spiking Neural Networks

    This episode explores how spiking neural networks differ from conventional deep networks by representing information as time-based spikes rather than continuous activations, and why that makes them both appealing and difficult to train. It breaks down the main research directions in the field, including local spike-timing-dependent plasticity, direct gradient-based training of spiking models, and ANN-to-SNN conversion, emphasizing that these approaches solve different problems and should not be treated as interchangeable. The discussion highlights the paper’s core argument that spiking models may help bridge neuroscience, energy-efficient neuromorphic hardware, and competitive machine learning, while also questioning whether claims about progress often blur together engineering wins, biological realism, and benchmark performance. Listeners would find it interesting for its clear explanation of the ANN-SNN gap, the tradeoffs behind biologically inspired AI, and the unresolved question of whether spiking systems can become both practical and scientifically meaningful. Sources: 1. Deep Learning in Spiking Neural Networks — Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothee Masquelier, Anthony S. Maida, 2018 http://arxiv.org/abs/1804.08150 2. Networks of Spiking Neurons: The Third Generation of Neural Network Models — Wolfgang Maass, 1997 https://scholar.google.com/scholar?q=Networks+of+Spiking+Neurons:+The+Third+Generation+of+Neural+Network+Models 3. Deep Learning in Spiking Neural Networks — Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothee Masquelier, Anthony Maida, 2019 https://scholar.google.com/scholar?q=Deep+Learning+in+Spiking+Neural+Networks 4. Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-Based Optimization to Spiking Neural Networks — Emre O. Neftci, Hesham Mostafa, Friedemann Zenke, 2019 https://scholar.google.com/scholar?q=Surrogate+Gradient+Learning+in+Spiking+Neural+Networks:+Bringing+the+Power+of+Gradient-Based+Optimization+to+Spiking+Neural+Networks 5. Backpropagation and the Brain — Timothy P. Lillicrap, Adam Santoro, Luke Marris, Colin J. Akerman, Geoffrey Hinton, 2020 https://scholar.google.com/scholar?q=Backpropagation+and+the+Brain 6. Fast-Classifying, High-Accuracy Spiking Deep Networks Through Weight and Threshold Balancing — Peter U. Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, Michael Pfeiffer, 2015 https://scholar.google.com/scholar?q=Fast-Classifying,+High-Accuracy+Spiking+Deep+Networks+Through+Weight+and+Threshold+Balancing 7. Training Deep Spiking Neural Networks Using Backpropagation — Jun Haeng Lee, Tobi Delbruck, Michael Pfeiffer, 2016 https://scholar.google.com/scholar?q=Training+Deep+Spiking+Neural+Networks+Using+Backpropagation 8. Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification — Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, Shih-Chii Liu, 2017 https://scholar.google.com/scholar?q=Conversion+of+Continuous-Valued+Deep+Networks+to+Efficient+Event-Driven+Networks+for+Image+Classification 9. Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks — Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, Luping Shi, 2018 https://scholar.google.com/scholar?q=Spatio-Temporal+Backpropagation+for+Training+High-Performance+Spiking+Neural+Networks 10. SLAYER: Spike Layer Error Reassignment in Time — Sumit Bam Shrestha, Garrick Orchard, 2018 https://scholar.google.com/scholar?q=SLAYER:+Spike+Layer+Error+Reassignment+in+Time 11. A spiking neural network with continuous local learning for robust online brain machine interface — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=A+spiking+neural+network+with+continuous+local+learning+for+robust+online+brain+machine+interface 12. Adaptive deep spiking neural network with global-local learning via balanced excitatory and inhibitory mechanism — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=Adaptive+deep+spiking+neural+network+with+global-local+learning+via+balanced+excitatory+and+inhibitory+mechanism 13. Delay learning based on temporal coding in spiking neural networks — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=Delay+learning+based+on+temporal+coding+in+spiking+neural+networks 14. Temporal-coded spiking neural networks with dynamic firing threshold: Learning with event-driven backpropagation — approx. unknown from snippet, 2021 https://scholar.google.com/scholar?q=Temporal-coded+spiking+neural+networks+with+dynamic+firing+threshold:+Learning+with+event-driven+backpropagation 15. Spikingformer: Spike-driven residual learning for transformer-based spiking neural network — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=Spikingformer:+Spike-driven+residual+learning+for+transformer-based+spiking+neural+network 16. Sstformer: Bridging spiking neural network and memory support transformer for frame-event based recognition — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=Sstformer:+Bridging+spiking+neural+network+and+memory+support+transformer+for+frame-event+based+recognition 17. TE-Spikformer: Temporal-enhanced spiking neural network with transformer — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=TE-Spikformer:+Temporal-enhanced+spiking+neural+network+with+transformer 18. NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=NeuronSpark:+A+Spiking+Neural+Network+Language+Model+with+Selective+State+Space+Dynamics 19. SpikingSSMs: Learning long sequences with sparse and parallel spiking state space models — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=SpikingSSMs:+Learning+long+sequences+with+sparse+and+parallel+spiking+state+space+models 20. Delays in Spiking Neural Networks: A State Space Model Approach — approx. unknown from snippet, recent https://scholar.google.com/scholar?q=Delays+in+Spiking+Neural+Networks:+A+State+Space+Model+Approach 21. AI Post Transformers: Directly Trained Spiking DQNs for Atari — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-directly-trained-spiking-dqns-for-atari-947693.mp3 22. AI Post Transformers: SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/spikingbrain-brain-inspired-llms-for-efficient-long-context-processing/ Interactive Visualization: Deep Learning in Spiking Neural Networks

  7. -3 J

    FPGA Neural Network Accelerators for Space

    This episode explores a survey of FPGA-based neural network accelerators for space applications, focusing on what the literature actually demonstrates rather than treating “space AI” as a single vague category. It explains why FPGAs are appealing for onboard inference, including tight control over hardware, energy efficiency, and the ability to support vision, autonomy, compression, navigation, and selective downlink under severe space constraints like limited bandwidth, latency, power, and thermal limits. The discussion also emphasizes a central argument of the paper: evidence for true space-ready systems is thinner than the hype suggests, with a need to distinguish lab demos on commercial boards from hardware that can handle radiation and mission-critical fault tolerance. Listeners would find it interesting because it connects modern AI hardware design to the harsh realities of spacecraft engineering and shows how a careful survey can reveal where the field is mature, where it is overstating its progress, and what technical gaps still matter most. Sources: 1. FPGA-Based Neural Network Accelerators for Space Applications: A Survey — Pedro Antunes, Artur Podobas, 2025 http://arxiv.org/abs/2504.16173 2. An FPGA-Based Hardware Accelerator for CNNs Inference on Board Satellites: Benchmarking with Myriad 2-Based Solution for the CloudScout Case Study — Emilio Rapuano, Gabriele Meoni, Tommaso Pacini, Gianmarco Dinelli, Gianluca Furano, Gianluca Giuffrida, Luca Fanucci, 2021 https://scholar.google.com/scholar?q=An+FPGA-Based+Hardware+Accelerator+for+CNNs+Inference+on+Board+Satellites:+Benchmarking+with+Myriad+2-Based+Solution+for+the+CloudScout+Case+Study 3. Reconfigurable Framework for Resilient Semantic Segmentation for Space Applications — Sebastian Sabogal, Alan George, Gary Crum, 2021 https://scholar.google.com/scholar?q=Reconfigurable+Framework+for+Resilient+Semantic+Segmentation+for+Space+Applications 4. Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators — Zhen Gao, Shihui Gao, Yi Yao, Qiang Liu, Shulin Zeng, Guangjun Ge, Yu Wang, Anees Ullah, Pedro Reviriego, 2023 https://scholar.google.com/scholar?q=Systematic+Reliability+Evaluation+of+FPGA+Implemented+CNN+Accelerators 5. Online continual streaming learning for embedded space applications — Van-Tam Nguyen, Alaa Mazouz, 2024 https://scholar.google.com/scholar?q=Online+continual+streaming+learning+for+embedded+space+applications 6. A Quarter of a Century of Neuromorphic Architectures on FPGAs -- an Overview — Wiktor J. Szczerek, Artur Podobas, 2025 https://scholar.google.com/scholar?q=A+Quarter+of+a+Century+of+Neuromorphic+Architectures+on+FPGAs+--+an+Overview 7. Hardware platforms enabling edge AI for space applications: A critical review — unknown, approximate recent review authors, 2024 or 2025 https://scholar.google.com/scholar?q=Hardware+platforms+enabling+edge+AI+for+space+applications:+A+critical+review 8. Study of Radiation Effects on FPGA and GPU based Neural Networks Accelerator Designs — unknown, approximate recent systems authors, 2024 or 2025 https://scholar.google.com/scholar?q=Study+of+Radiation+Effects+on+FPGA+and+GPU+based+Neural+Networks+Accelerator+Designs 9. A Radiation-Hardened Neuromorphic Imager with Self-Healing Spiking Pixels and Unified Spiking Neural Network for Space Robotics — unknown, approximate neuromorphic hardware authors, 2024 or 2025 https://scholar.google.com/scholar?q=A+Radiation-Hardened+Neuromorphic+Imager+with+Self-Healing+Spiking+Pixels+and+Unified+Spiking+Neural+Network+for+Space+Robotics 10. Onboard Optimization and Learning: A Survey — unknown, approximate recent survey authors, 2024 or 2025 https://scholar.google.com/scholar?q=Onboard+Optimization+and+Learning:+A+Survey 11. Review on hardware devices and software techniques enabling neural network inference onboard satellites — unknown, approximate recent review authors, 2024 or 2025 https://scholar.google.com/scholar?q=Review+on+hardware+devices+and+software+techniques+enabling+neural+network+inference+onboard+satellites 12. AI Post Transformers: FlatAttention for Tile-Based Accelerator Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-flatattention-for-tile-based-accelerator-56e6ca.mp3 13. AI Post Transformers: AWQ: On-Device LLM Compression and Acceleration — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/awq-on-device-llm-compression-and-acceleration/ 14. AI Post Transformers: KVSwap for Disk-Aware Long-Context On-Device Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-16-kvswap-for-disk-aware-long-context-on-de-f3c15e.mp3 Interactive Visualization: FPGA Neural Network Accelerators for Space

  8. -3 J

    World-R1 Improves 3D Consistency in Text-to-Video

    This episode explores World-R1, a post-training method for improving 3D consistency in text-to-video generation without redesigning the underlying model architecture. It explains how the approach combines reinforcement learning, pretrained 3D reconstruction critics, vision-language rewards, and camera-motion-focused prompt data to push generated videos toward more stable geometry under viewpoint changes. The discussion highlights why this matters for scene persistence, occlusion, and camera motion, especially if video models are ever to serve as usable world models rather than just visually plausible clip generators. Listeners would find it interesting because it digs into a concrete attempt to make today’s impressive but fragile video systems behave more like coherent simulated worlds. Sources: 1. World-R1: Reinforcing 3D Constraints for Text-to-Video Generation — Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang, 2026 http://arxiv.org/abs/2604.24764 2. Make-A-Video: Text-to-Video Generation without Text-Video Data — Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman, 2022 https://scholar.google.com/scholar?q=Make-A-Video:+Text-to-Video+Generation+without+Text-Video+Data 3. Imagen Video: High Definition Video Generation with Diffusion Models — Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans, 2022 https://scholar.google.com/scholar?q=Imagen+Video:+High+Definition+Video+Generation+with+Diffusion+Models 4. Lumiere: A Space-Time Diffusion Model for Video Generation — Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri, 2024 https://scholar.google.com/scholar?q=Lumiere:+A+Space-Time+Diffusion+Model+for+Video+Generation 5. Video generation models as world simulators — Tim Brooks, Bill Peebles, Conor Holmes, Will DePue, Alex Payne, Robin Rombach, Patrick Esser, Jon Barron, Bhargav Chan, and OpenAI collaborators, 2024 https://scholar.google.com/scholar?q=Video+generation+models+as+world+simulators 6. CameraCtrl: Enabling Camera Control for Text-to-Video Generation — Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang, 2024 https://scholar.google.com/scholar?q=CameraCtrl:+Enabling+Camera+Control+for+Text-to-Video+Generation 7. WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance — Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang, 2025 https://scholar.google.com/scholar?q=WorldForge:+Unlocking+Emergent+3D/4D+Generation+in+Video+Diffusion+Model+via+Training-Free+Guidance 8. FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction — Yixiang Dai, Fan Jiang, Chiyu Wang, Mu Xu, Yonggang Qi, 2025 https://scholar.google.com/scholar?q=FantasyWorld:+Geometry-Consistent+World+Modeling+via+Unified+Video+and+3D+Prediction 9. 3D and 4D World Modeling: A Survey — Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu and many coauthors, 2025 https://scholar.google.com/scholar?q=3D+and+4D+World+Modeling:+A+Survey 10. Wan: Open and Advanced Large-Scale Video Generative Models — WanTeam, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, and many others, 2025 https://scholar.google.com/scholar?q=Wan:+Open+and+Advanced+Large-Scale+Video+Generative+Models 11. Flow-GRPO: Training Flow Matching Models via Online RL — Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang, 2025 https://scholar.google.com/scholar?q=Flow-GRPO:+Training+Flow+Matching+Models+via+Online+RL 12. Depth Anything 3: Recovering the Visual Space from Any Views — Haotong Lin, Sili Chen, Junhao Liew, Donny Y. Chen, Zhenyu Li, Guang Shi, Jiashi Feng, Bingyi Kang, 2025 https://scholar.google.com/scholar?q=Depth+Anything+3:+Recovering+the+Visual+Space+from+Any+Views 13. VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward — Zhaochong An, Orest Kupyn, Theo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla, 2026 https://scholar.google.com/scholar?q=VGGRPO:+Towards+World-Consistent+Video+Generation+with+4D+Latent+Reward 14. WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion — Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang, 2025 https://scholar.google.com/scholar?q=WorldWarp:+Propagating+3D+Geometry+with+Asynchronous+Video+Diffusion 15. Pre-Trained Video Generative Models as World Simulators — Haoran He, Yang Zhang, Liang Lin, Zhongwen Xu, and collaborators, 2025 https://scholar.google.com/scholar?q=Pre-Trained+Video+Generative+Models+as+World+Simulators 16. Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors — authors not visible in the snippet, recent, likely 2025-2026 https://scholar.google.com/scholar?q=Towards+Realistic+and+Consistent+Orbital+Video+Generation+via+3D+Foundation+Priors 17. Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding — authors not visible in the snippet, recent, likely 2025-2026 https://scholar.google.com/scholar?q=Generation+Models+Know+Space:+Unleashing+Implicit+3D+Priors+for+Scene+Understanding 18. Is a Picture Worth a Thousand Words? Delving into Spatial Reasoning for Vision-Language Models — authors not visible in the snippet, recent, likely 2024-2025 https://scholar.google.com/scholar?q=Is+a+Picture+Worth+a+Thousand+Words?+Delving+into+Spatial+Reasoning+for+Vision-Language+Models 19. Fast Multi-View Consistent 3D Editing with Video Priors — authors not visible in the snippet, recent, likely 2024-2025 https://scholar.google.com/scholar?q=Fast+Multi-View+Consistent+3D+Editing+with+Video+Priors 20. AI Post Transformers: LeWorldModel: Stable Joint-Embedding World Models from Pixels — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-leworldmodel-stable-joint-embedding-worl-650f9f.mp3 21. AI Post Transformers: DreamerV3 World Models Across 150 Tasks — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-dreamerv3-world-models-across-150-tasks-af5edb.mp3 Interactive Visualization: World-R1 Improves 3D Consistency in Text-to-Video

À propos

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

Vous aimeriez peut‑être aussi