AI Post Transformers

mcgrof

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

  1. 22H AGO

    Breiman's Two Cultures of Statistical Modeling

    This episode explores Leo Breiman’s “Statistical Modeling: The Two Cultures” as a sharp argument about a core divide in data science: building explicit probabilistic models to explain the world versus training algorithms that simply predict well on new data. It traces how that split maps onto inference versus prediction, connects Breiman’s critique to earlier ideas from Tukey and Box, and shows how later work such as Shmueli’s formalized the distinction. The discussion also grounds the debate in concrete methods, from linear regression and Cox models to CART, bagging, random forests, and neural networks, highlighting why algorithmic approaches gained ground on messy, high-dimensional problems. Listeners would find it interesting because it explains a foundational argument that still shapes modern machine learning, while also probing where interpretable classical models remain essential in areas like medicine, policy, and reliability. Sources: 1. Breiman's Two Cultures of Statistical Modeling https://www2.math.uu.se/~thulin/mm/breiman.pdf 2. The Future of Data Analysis — John W. Tukey, 1962 https://scholar.google.com/scholar?q=The+Future+of+Data+Analysis 3. Science and Statistics — George E. P. Box, 1976 https://scholar.google.com/scholar?q=Science+and+Statistics 4. Regression Models and Life-Tables — D. R. Cox, 1972 https://scholar.google.com/scholar?q=Regression+Models+and+Life-Tables 5. Statistical Modeling: The Two Cultures — Leo Breiman, 2001 https://scholar.google.com/scholar?q=Statistical+Modeling:+The+Two+Cultures 6. Bagging Predictors — Leo Breiman, 1996 https://scholar.google.com/scholar?q=Bagging+Predictors 7. Random Forests — Leo Breiman, 2001 https://scholar.google.com/scholar?q=Random+Forests 8. Greedy Function Approximation: A Gradient Boosting Machine — Jerome H. Friedman, 2001 https://scholar.google.com/scholar?q=Greedy+Function+Approximation:+A+Gradient+Boosting+Machine 9. Clinical versus Actuarial Judgment — Robyn M. Dawes, David Faust, Paul E. Meehl, 1989 https://scholar.google.com/scholar?q=Clinical+versus+Actuarial+Judgment 10. To Explain or To Predict? — Galit Shmueli, 2010 https://scholar.google.com/scholar?q=To+Explain+or+To+Predict? 11. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning — Tal Yarkoni, Jacob Westfall, 2017 https://scholar.google.com/scholar?q=Choosing+Prediction+Over+Explanation+in+Psychology:+Lessons+From+Machine+Learning 12. Classification and Regression Trees — Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone, 1984 https://scholar.google.com/scholar?q=Classification+and+Regression+Trees 13. Induction of Decision Trees — J. R. Quinlan, 1986 https://scholar.google.com/scholar?q=Induction+of+Decision+Trees 14. A Random Forest Guided Tour — Gérard Biau, Erwan Scornet, 2016 https://scholar.google.com/scholar?q=A+Random+Forest+Guided+Tour 15. Learning Representations by Back-Propagating Errors — David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, 1986 https://scholar.google.com/scholar?q=Learning+Representations+by+Back-Propagating+Errors 16. Multilayer Feedforward Networks Are Universal Approximators — Kurt Hornik, Maxwell Stinchcombe, Halbert White, 1989 https://scholar.google.com/scholar?q=Multilayer+Feedforward+Networks+Are+Universal+Approximators 17. ImageNet Classification with Deep Convolutional Neural Networks — Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2012 https://scholar.google.com/scholar?q=ImageNet+Classification+with+Deep+Convolutional+Neural+Networks 18. Deep Learning — Yann LeCun, Yoshua Bengio, Geoffrey Hinton, 2015 https://scholar.google.com/scholar?q=Deep+Learning 19. Arcing Classifiers — Leo Breiman, 1998 https://scholar.google.com/scholar?q=Arcing+Classifiers 20. Generalized Additive Models — Trevor Hastie and Robert Tibshirani, 1990 https://scholar.google.com/scholar?q=Generalized+Additive+Models 21. The Elements of Statistical Learning — Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2001 https://scholar.google.com/scholar?q=The+Elements+of+Statistical+Learning 22. Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity — Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, Ian J. Barnett, 2022/2023 https://scholar.google.com/scholar?q=Sparse+Neural+Additive+Model:+Interpretable+Deep+Learning+with+Feature+Selection+via+Group+Sparsity 23. Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean — Anton Frederik Thielmann, Rene-Marcel Kruse, Thomas Kneib, Benjamin Safken, 2024 https://scholar.google.com/scholar?q=Neural+Additive+Models+for+Location+Scale+and+Shape:+A+Framework+for+Interpretable+Neural+Regression+Beyond+the+Mean 24. Conformal Prediction: A Data Perspective — Xiaofan Zhou, Baiting Chen, Yu Gui, Lu Cheng, 2024 https://scholar.google.com/scholar?q=Conformal+Prediction:+A+Data+Perspective 25. Large language model validity via enhanced conformal prediction methods — John J. Cherian, Isaac Gibbs, Emmanuel J. Candes, 2024 https://scholar.google.com/scholar?q=Large+language+model+validity+via+enhanced+conformal+prediction+methods 26. CPSign: conformal prediction for cheminformatics modeling — Staffan Arvidsson McShane, Ulf Norinder, Jonathan Alvarsson, Ernst Ahlberg, Lars Carlsson, Ola Spjuth, 2024 https://scholar.google.com/scholar?q=CPSign:+conformal+prediction+for+cheminformatics+modeling 27. Open Problems in Mechanistic Interpretability — Lee Sharkey et al., 2025 https://scholar.google.com/scholar?q=Open+Problems+in+Mechanistic+Interpretability 28. AI Post Transformers: Evaluating LLM Embeddings for Psychometric Personality Prediction — Hal Turing & Dr. Ada Shannon, Tue, https://podcast.do-not-panic.com/episodes/evaluating-llm-embeddings-for-psychometric-personality-prediction/ 29. AI Post Transformers: Introducing RTEB: Retrieval Embedding Benchmark — Hal Turing & Dr. Ada Shannon, Fri, https://podcast.do-not-panic.com/episodes/introducing-rteb-retrieval-embedding-benchmark/ 30. AI Post Transformers: Information Bottleneck-based Causal Attention for Medical Image Recognition — Hal Turing & Dr. Ada Shannon, Tue, https://podcast.do-not-panic.com/episodes/information-bottleneck-based-causal-attention-for-medical-image-recognition/ Interactive Visualization: Breiman's Two Cultures of Statistical Modeling

  2. 22H AGO

    Experience-Based Learning Beyond Human Data

    This episode explores a 2025 DeepMind and University of Alberta preprint arguing that AI is reaching the limits of learning from human-generated data and that the next major advances will come from agents learning through interaction and feedback in environments. It explains the shift from static pretraining to grounded reinforcement learning, defining key ideas like long-term reward optimization, self-play, world models, and why this paradigm has powered systems such as AlphaGo Zero, AlphaZero, MuZero, and theorem-proving agents in verifier-rich domains like math, code, and games. The discussion also stresses the practical obstacles that have kept RL from dominating mainstream AI—expensive data collection, sparse rewards, instability, and safety concerns—and questions whether this “era of experience” will extend broadly or remain strongest in environments where success can be automatically checked. Listeners would find it interesting for its clear breakdown of a major proposed shift in AI research and its skeptical take on whether the evidence really supports such a sweeping roadmap. Sources: 1. Experience-Based Learning Beyond Human Data https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf 2. https://www.lesswrong.com/posts/TCGgiJAinGgcMEByt/the-era-of-experience-has-an-unsolved-technical-alignment https://www.lesswrong.com/posts/TCGgiJAinGgcMEByt/the-era-of-experience-has-an-unsolved-technical-alignment 3. Reinforcement Learning: An Introduction — Richard S. Sutton, Andrew G. Barto, 1998; 2nd edition 2018 https://scholar.google.com/scholar?q=Reinforcement+Learning:+An+Introduction 4. Human-level control through deep reinforcement learning — Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu and others, 2015 https://scholar.google.com/scholar?q=Human-level+control+through+deep+reinforcement+learning 5. Deep Reinforcement Learning: An Overview — Yuxi Li, 2017 https://scholar.google.com/scholar?q=Deep+Reinforcement+Learning:+An+Overview 6. Mastering Diverse Domains through World Models — Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver, 2020 https://scholar.google.com/scholar?q=Mastering+Diverse+Domains+through+World+Models 7. AlphaProof and AlphaGeometry 2 — DeepMind et al., 2024 https://scholar.google.com/scholar?q=AlphaProof+and+AlphaGeometry+2 8. Mastering the Game of Go without Human Knowledge — David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, 2017 https://scholar.google.com/scholar?q=Mastering+the+Game+of+Go+without+Human+Knowledge 9. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm — David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, 2018 https://scholar.google.com/scholar?q=Mastering+Chess+and+Shogi+by+Self-Play+with+a+General+Reinforcement+Learning+Algorithm 10. Reward is Enough — David Silver, Satinder Singh, Doina Precup, Richard S. Sutton, 2021 https://scholar.google.com/scholar?q=Reward+is+Enough 11. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — DeepSeek-AI et al., 2024 https://scholar.google.com/scholar?q=DeepSeekMath:+Pushing+the+Limits+of+Mathematical+Reasoning+in+Open+Language+Models 12. Reinforcement Learning from Human Feedback: Learning Dynamic Choices via Human Preferences — Paul Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, 2017 https://scholar.google.com/scholar?q=Reinforcement+Learning+from+Human+Feedback:+Learning+Dynamic+Choices+via+Human+Preferences 13. Agent Lightning: Train ANY AI Agents with Reinforcement Learning — Yunfan Luo, et al., 2025 https://scholar.google.com/scholar?q=Agent+Lightning:+Train+ANY+AI+Agents+with+Reinforcement+Learning 14. Beyond human data: Scaling self-training for problem-solving with language models — approx. recent LLM self-training authors, 2024-2025 https://scholar.google.com/scholar?q=Beyond+human+data:+Scaling+self-training+for+problem-solving+with+language+models 15. Personalizing reinforcement learning from human feedback with variational preference learning — approx. recent preference-learning authors, 2024-2025 https://scholar.google.com/scholar?q=Personalizing+reinforcement+learning+from+human+feedback+with+variational+preference+learning 16. Online iterative reinforcement learning from human feedback with general preference model — approx. recent RLHF authors, 2024-2025 https://scholar.google.com/scholar?q=Online+iterative+reinforcement+learning+from+human+feedback+with+general+preference+model 17. Efficient preference-based reinforcement learning using learned dynamics models — approx. recent model-based preference RL authors, 2023-2025 https://scholar.google.com/scholar?q=Efficient+preference-based+reinforcement+learning+using+learned+dynamics+models 18. Refining Large Language Models with Self-Generated Data Through Iterative Training — approx. recent self-generated data / iterative training authors, 2024-2025 https://scholar.google.com/scholar?q=Refining+Large+Language+Models+with+Self-Generated+Data+Through+Iterative+Training 19. Co-evolved Self-Critique: Enhancing Large Language Models with Self-Generated Data — approx. recent self-critique authors, 2024-2025 https://scholar.google.com/scholar?q=Co-evolved+Self-Critique:+Enhancing+Large+Language+Models+with+Self-Generated+Data 20. Agentic reward modeling: Integrating human preferences with verifiable correctness signals for reliable reward systems — approx. recent reward-modeling authors, 2024-2025 https://scholar.google.com/scholar?q=Agentic+reward+modeling:+Integrating+human+preferences+with+verifiable+correctness+signals+for+reliable+reward+systems 21. Crossing the reward bridge: Expanding RL with verifiable rewards across diverse domains — approx. recent RLVR authors, 2024-2025 https://scholar.google.com/scholar?q=Crossing+the+reward+bridge:+Expanding+RL+with+verifiable+rewards+across+diverse+domains 22. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/experiential-reinforcement-learning-internalizing-reflection-for-better-policy-t/ 23. AI Post Transformers: MEMSEARCHER: Reinforcement Learning for LLM Memory Management — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-memsearcher-reinforcement-learning-for-l-e9ad84.mp3 24. AI Post Transformers: MetaClaw: Just Talk and Continual Agent Adaptation — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-31-metaclaw-meta-learning-agents-in-the-wil-ab324c.mp3 25. AI Post Transformers: Memory Intelligence Agents for Deep Research — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-memory-intelligence-agents-for-deep-rese-cd39e3.mp3 26. AI Post Transformers: Memory Sparse Attention for 100M-Token Scaling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-07-memory-sparse-attention-for-100m-token-s-377cff.mp3 27. AI Post Transformers: Recursive Language Models for Arbitrarily Long Prompts — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-recursive-language-models-for-arbitraril-fbcd1c.mp3 28. AI Post Transformers: Simple Self-Distillation for Better Code Generation — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-02-simple-self-distillation-for-better-code-cc88e0.mp3 29. AI Post Transformers: IMO-Bench for Robust Mathematical Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-imo-bench-for-robust-mathematical-reason-143489.mp3 30. AI Post Transformers: AI Agent Traps and Prompt Injection — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-02-ai-agent-traps-and-prompt-injection-7ce4ba.mp3 31. AI Post Transformers: ASI-Evolve for Data, Architectures, and RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-05-asi-evolve-for-data-architectures-and-rl-197b2b.mp3 Interactive Visualization: Experience-Based Learning Beyond Human Data

  3. 1D AGO

    DeepSeek-V4 and Practical Million-Token Context

    This episode explores DeepSeek-V4’s claim that million-token context windows may finally be practical for real-world use, not just benchmark demos. It explains how the model combines hybrid attention, mixture-of-experts routing, and manifold-constrained hyper-connections to reduce the usual memory and compute costs of long-context transformers while trying to preserve reasoning, coding, and agent performance. The discussion places these design choices in context by comparing them with earlier long-context approaches like Transformer-XL, Longformer, and Big Bird, and by separating headline model size from the more meaningful question of active runtime cost. Listeners would find it interesting because the episode treats the paper not as a simple scaling story, but as a broader systems argument about whether extreme context length can become genuinely useful without hidden tradeoffs. Sources: 1. DeepSeek-V4 and Practical Million-Token Context https://cas-bridge.xethub.hf.co/xet-bridge-us/69e864fd6b68f7e6cfc63ca3/4def459c20d33bee897605e5149c7e19d52c49ad592e547dc0ee24044bced2ce?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20260425%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260425T183145Z&X-Amz-Expires=3600&X-Amz-Signature=f1688b314e79308272cdbfb7c43f2928eb41aa605779c0ce3104d3377e763d53&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=68488f482f229c24e59d66a0&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27DeepSeek_V4.pdf%3B+filename%3D%22DeepSeek_V4.pdf%22%3B&response-content-type=application%2Fpdf&x-amz-checksum-mode=ENABLED&x-id=GetObject&Expires=1777145505&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc3NzE0NTUwNX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OWU4NjRmZDZiNjhmN2U2Y2ZjNjNjYTMvNGRlZjQ1OWMyMGQzM2JlZTg5NzYwNWU1MTQ5YzdlMTlkNTJjNDlhZDU5MmU1NDdkYzBlZTI0MDQ0YmNlZDJjZSoifV19&Signature=PduE%7ECXWKurfPe2KmS3YcJQpJcnQmkzyHq%7Ej3dWzYkifRocYIo45kbcjzBIjHTG71uetaQ0rFSPxR27syyAX0bjtIZBIS6d7T7A42ay-uJs0uNjN5mBasT2aQftQBeryDU3bXApQWNVmAxl-kzPcx4WfzWUpZAtFJp4LWZcwLUfBR2Qu%7EZuWg5W6gxJhPKjIfx8MZSdJdzsTz7swo8fX22zwuAxsaMPWU9S-F%7EbNzmvPgcM1OSXm-SZfIzNNmPd3Mi0KcTWf57maRGRzRuDBnkQa4GDj7SdqduSO5g2bfAUzTesQPomRfkqOL1rjvwzcRGLuE1mHgK%7EJxN5EecEqIg__&Key-Pair-Id=K2L8F4GPSG1IFC 2. DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models — DeepSeek-AI et al., 2025 https://scholar.google.com/scholar?q=DeepSeek-V3.2:+Pushing+the+Frontier+of+Open+Large+Language+Models 3. mHC: Manifold-Constrained Hyper-Connections — Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang, 2025 https://scholar.google.com/scholar?q=mHC:+Manifold-Constrained+Hyper-Connections 4. Hyper-Connections — Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou, 2025 https://scholar.google.com/scholar?q=Hyper-Connections 5. Muon is Scalable for LLM Training — Jingyuan Liu et al., 2025 https://scholar.google.com/scholar?q=Muon+is+Scalable+for+LLM+Training 6. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks — Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li, 2024 https://scholar.google.com/scholar?q=LongBench+v2:+Towards+Deeper+Understanding+and+Reasoning+on+Realistic+Long-context+Multitasks 7. MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers — Chaithanya Bandi, Ben Hertzberg, Geobio Boo, Tejas Polakam, Jeff Da, Sami Hassaan, Manasi Sharma, Andrew Park, Ernesto Hernandez, Dan Rambado, Ivan Salazar, Rafael Cruz, Chetan Rane, Ben Levin, Brad Kenstler, Bing Liu, 2026 https://scholar.google.com/scholar?q=MCP-Atlas:+A+Large-Scale+Benchmark+for+Tool-Use+Competency+with+Real+MCP+Servers 8. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse — Jingbo Yang et al., 2025 https://scholar.google.com/scholar?q=KVLink:+Accelerating+Large+Language+Models+via+Efficient+KV+Cache+Reuse 9. HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse — Yuwei An et al., 2025 https://scholar.google.com/scholar?q=HyperRAG:+Enhancing+Quality-Efficiency+Tradeoffs+in+Retrieval-Augmented+Generation+with+Reranker+KV-Cache+Reuse 10. ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation — Shihao Wang et al., 2026 https://scholar.google.com/scholar?q=ProphetKV:+User-Query-Driven+Selective+Recomputation+for+Efficient+KV+Cache+Reuse+in+Retrieval-Augmented+Generation 11. When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training — Haonan Wang et al., 2024 https://scholar.google.com/scholar?q=When+Precision+Meets+Position:+BFloat16+Breaks+Down+RoPE+in+Long-Context+Training 12. LongAttn: Selecting Long-context Training Data via Token-level Attention — Longyun Wu et al., 2025 https://scholar.google.com/scholar?q=LongAttn:+Selecting+Long-context+Training+Data+via+Token-level+Attention 13. On the Convergence of Gradient Descent on Learning Transformers with Residual Connections — Zhen Qin et al., 2025 https://scholar.google.com/scholar?q=On+the+Convergence+of+Gradient+Descent+on+Learning+Transformers+with+Residual+Connections 14. ResiDual: Transformer with Dual Residual Connections — Shufang Xie et al., 2023 https://scholar.google.com/scholar?q=ResiDual:+Transformer+with+Dual+Residual+Connections 15. HyperAttention: Long-context Attention in Near-Linear Time — Insu Han et al., 2023 https://scholar.google.com/scholar?q=HyperAttention:+Long-context+Attention+in+Near-Linear+Time 16. AI Post Transformers: DeepSeek-V3: A Technical Report — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/deepseek-v3-a-technical-report/ 17. AI Post Transformers: Memory Sparse Attention for 100M-Token Scaling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-07-memory-sparse-attention-for-100m-token-s-377cff.mp3 18. AI Post Transformers: Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/ring-linear-efficient-hybrid-architecture-for-long-context-reasoning/ 19. AI Post Transformers: Nemotron 3 Super Hybrid Mamba-Transformer MoE — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-nemotron-3-super-hybrid-mamba-transforme-31ac75.mp3 20. AI Post Transformers: GLM-5: Transitioning from Vibe Coding to Agentic Engineering — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/glm-5-transitioning-from-vibe-coding-to-agentic-engineering/ 21. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 Interactive Visualization: DeepSeek-V4 and Practical Million-Token Context

  4. 1D AGO

    Kimi K2.5 and Visual Agent Swarms

    This episode explores Moonshot AI’s KIMI K2.5, a 2026 multimodal model that aims to improve text reasoning, visual understanding, and agent-style task execution within a single open system. It explains the paper’s two main bets: training text and vision jointly from early stages rather than bolting vision on later, and using an external “agent swarm” orchestration layer to split wide-search tasks across parallel sub-agents. The discussion compares these ideas to earlier vision-language systems and multi-agent frameworks, while also questioning whether the reported gains in quality, latency, and cross-modal robustness are fully supported by the evidence. Listeners would find it interesting for its clear breakdown of where the real novelty lies: not a new transformer architecture, but a systems design argument about how future AI models may combine multimodal learning with distributed task coordination. Sources: 1. Kimi K2.5: Visual Agentic Intelligence — Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, S. H. Cai, Yuan Cao, Y. Charles, H. S. Che, Cheng Chen, Guanduo Chen, Huarong Chen, Jia Chen, Jiahao Chen, Jianlong Chen, Jun Chen, Kefan Chen, Liang Chen, Ruijue Chen, Xinhao Chen, Yanru Chen, Yanxu Chen, Yicun Chen, Yimin Chen, Yingjiang Chen, Yuankun Chen, Yujie Chen, Yutian Chen, Zhirong Chen, Ziwei Chen, Dazhi Cheng, Minghan Chu, Jialei Cui, Jiaqi Deng, Muxi Diao, Hao Ding, Mengfan Dong, Mengnan Dong, Yuxin Dong, Yuhao Dong, Angang Du, Chenzhuang Du, Dikang Du, Lingxiao Du, Yulun Du, Yu Fan, Shengjun Fang, Qiulin Feng, Yichen Feng, Garimugai Fu, Kelin Fu, Hongcheng Gao, Tong Gao, Yuyao Ge, Shangyi Geng, Chengyang Gong, Xiaochen Gong, Zhuoma Gongque, Qizheng Gu, Xinran Gu, Yicheng Gu, Longyu Guan, Yuanying Guo, Xiaoru Hao, Weiran He, Wenyang He, Yunjia He, Chao Hong, Hao Hu, Jiaxi Hu, Yangyang Hu, Zhenxing Hu, Ke Huang, Ruiyuan Huang, Weixiao Huang, Zhiqi Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yu Jing, Guokun Lai, Aidi Li, C. Li, Cheng Li, Fang Li, Guanghe Li, Guanyu Li, Haitao Li, Haoyang Li, Jia Li, Jingwei Li, Junxiong Li, Lincan Li, Mo Li, Weihong Li, Wentao Li, Xinhang Li, Xinhao Li, Yang Li, Yanhao Li, Yiwei Li, Yuxiao Li, Zhaowei Li, Zheming Li, Weilong Liao, Jiawei Lin, Xiaohan Lin, Zhishan Lin, Zichao Lin, Cheng Liu, Chenyu Liu, Hongzhang Liu, Liang Liu, Shaowei Liu, Shudong Liu, Shuran Liu, Tianwei Liu, Tianyu Liu, Weizhou Liu, Xiangyan Liu, Yangyang Liu, Yanming Liu, Yibo Liu, Yuanxin Liu, Yue Liu, Zhengying Liu, Zhongnuo Liu, Enzhe Lu, Haoyu Lu, Zhiyuan Lu, Junyu Luo, Tongxu Luo, Yashuo Luo, Long Ma, Yingwei Ma, Shaoguang Mao, Yuan Mei, Xin Men, Fanqing Meng, Zhiyong Meng, Yibo Miao, Minqing Ni, Kun Ouyang, Siyuan Pan, Bo Pang, Yuchao Qian, Ruoyu Qin, Zeyu Qin, Jiezhong Qiu, Bowen Qu, Zeyu Shang, Youbo Shao, Tianxiao Shen, Zhennan Shen, Juanfeng Shi, Lidong Shi, Shengyuan Shi, Feifan Song, Pengwei Song, Tianhui Song, Xiaoxi Song, Hongjin Su, Jianlin Su, Zhaochen Su, Lin Sui, Jinsong Sun, Junyao Sun, Tongyu Sun, Flood Sung, Yunpeng Tai, Chuning Tang, Heyi Tang, Xiaojuan Tang, Zhengyang Tang, Jiawen Tao, Shiyuan Teng, Chaoran Tian, Pengfei Tian, Ao Wang, Bowen Wang, Chensi Wang, Chuang Wang, Congcong Wang, Dingkun Wang, Dinglu Wang, Dongliang Wang, Feng Wang, Hailong Wang, Haiming Wang, Hengzhi Wang, Huaqing Wang, Hui Wang, Jiahao Wang, Jinhong Wang, Jiuzheng Wang, Kaixin Wang, Linian Wang, Qibin Wang, Shengjie Wang, Shuyi Wang, Si Wang, Wei Wang, Xiaochen Wang, Xinyuan Wang, Yao Wang, Yejie Wang, Yipu Wang, Yiqin Wang, Yucheng Wang, Yuzhi Wang, Zhaoji Wang, Zhaowei Wang, Zhengtao Wang, Zhexu Wang, Zihan Wang, Zizhe Wang, Chu Wei, Ming Wei, Chuan Wen, Zichen Wen, Chengjie Wu, Haoning Wu, Junyan Wu, Rucong Wu, Wenhao Wu, Yuefeng Wu, Yuhao Wu, Yuxin Wu, Zijian Wu, Chenjun Xiao, Jin Xie, Xiaotong Xie, Yuchong Xie, Yifei Xin, Bowei Xing, Boyu Xu, Jianfan Xu, Jing Xu, Jinjing Xu, L. H. Xu, Lin Xu, Suting Xu, Weixin Xu, Xinbo Xu, Xinran Xu, Yangchuan Xu, Yichang Xu, Yuemeng Xu, Zelai Xu, Ziyao Xu, Junjie Yan, Yuzi Yan, Guangyao Yang, Hao Yang, Junwei Yang, Kai Yang, Ningyuan Yang, Ruihan Yang, Xiaofei Yang, Xinlong Yang, Ying Yang, Yi Yang, Yi Yang, Zhen Yang, Zhilin Yang, Zonghan Yang, Haotian Yao, Dan Ye, Wenjie Ye, Zhuorui Ye, Bohong Yin, Chengzhen Yu, Longhui Yu, Tao Yu, Tianxiang Yu, Enming Yuan, Mengjie Yuan, Xiaokun Yuan, Yang Yue, Weihao Zeng, Dunyuan Zha, Haobing Zhan, Dehao Zhang, Hao Zhang, Jin Zhang, Puqi Zhang, Qiao Zhang, Rui Zhang, Xiaobin Zhang, Y. Zhang, Yadong Zhang, Yangkun Zhang, Yichi Zhang, Yizhi Zhang, Yongting Zhang, Yu Zhang, Yushun Zhang, Yutao Zhang, Yutong Zhang, Zheng Zhang, Chenguang Zhao, Feifan Zhao, Jinxiang Zhao, Shuai Zhao, Xiangyu Zhao, Yikai Zhao, Zijia Zhao, Huabin Zheng, Ruihan Zheng, Shaojie Zheng, Tengyang Zheng, Junfeng Zhong, Longguang Zhong, Weiming Zhong, M. Zhou, Runjie Zhou, Xinyu Zhou, Zaida Zhou, Jinguo Zhu, Liya Zhu, Xinhao Zhu, Yuxuan Zhu, Zhen Zhu, Jingze Zhuang, Weiyu Zhuang, Ying Zou, Xinxing Zu, 2026 http://arxiv.org/abs/2602.02276 2. Large Language Model Based Multi-agents: A Survey of Progress and Challenges — Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang, 2024 https://scholar.google.com/scholar?q=Large+Language+Model+Based+Multi-agents:+A+Survey+of+Progress+and+Challenges 3. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework — Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, Chi Wang, et al., 2023 https://scholar.google.com/scholar?q=AutoGen:+Enabling+Next-Gen+LLM+Applications+via+Multi-Agent+Conversation+Framework 4. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework — Sirui Hong, Xiawu Zheng, Jiaqi Chen, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al., 2023 https://scholar.google.com/scholar?q=MetaGPT:+Meta+Programming+for+A+Multi-Agent+Collaborative+Framework 5. Kimi K2.5: Visual Agentic Intelligence — Kimi Team (including Tongtong Bai, Yifan Bai, Yiping Bao, et al.), 2026 https://scholar.google.com/scholar?q=Kimi+K2.5:+Visual+Agentic+Intelligence 6. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning — Richard S. Sutton, Doina Precup, Satinder Singh, 1999 https://scholar.google.com/scholar?q=Between+MDPs+and+Semi-MDPs:+A+Framework+for+Temporal+Abstraction+in+Reinforcement+Learning 7. The Option-Critic Architecture — Pierre-Luc Bacon, Jean Harb, Doina Precup, 2017 https://scholar.google.com/scholar?q=The+Option-Critic+Architecture 8. ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL — Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar, 2024 https://scholar.google.com/scholar?q=ArCHer:+Training+Language+Model+Agents+via+Hierarchical+Multi-Turn+RL 9. BrowseComp: a Simple Yet Challenging Benchmark for Browsing Agents — Jason Wei, Zhiqing Sun, Siawsh Papay, Sam McKinney, et al., 2025 https://scholar.google.com/scholar?q=BrowseComp:+a+Simple+Yet+Challenging+Benchmark+for+Browsing+Agents 10. WideSearch: Benchmarking Agentic Broad Info-Seeking — Runjing Wong, Jiawei Wang, Jiahui Zhao, et al., 2025 https://scholar.google.com/scholar?q=WideSearch:+Benchmarking+Agentic+Broad+Info-Seeking 11. ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization — Xiaokang Wu, Kaixuan Li, Yuxin Zhao, et al., 2025 https://scholar.google.com/scholar?q=ReSum:+Unlocking+Long-Horizon+Search+Intelligence+via+Context+Summarization 12. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments — Tianbao Xie, Ding Zhang, Junnan Chen, et al., 2024 https://scholar.google.com/scholar?q=OSWorld:+Benchmarking+Multimodal+Agents+for+Open-Ended+Tasks+in+Real+Computer+Environments 13. Qwen3-VL Technical Report — Sheng Bai, Yicong Cai, Ruijie Chen, et al., 2025 https://scholar.google.com/scholar?q=Qwen3-VL+Technical+Report 14. Thinking with images for multimodal reasoning: Foundations, methods, and future frontiers — not confirmed in provided snippet, recent (not confirmed from snippet) https://scholar.google.com/scholar?q=Thinking+with+images+for+multimodal+reasoning:+Foundations,+methods,+and+future+frontiers 15. Vision-deepresearch: Incentivizing deepresearch capability in multimodal large language models — not confirmed in provided snippet, recent (not confirmed from snippet) https://scholar.google.com/scholar?q=Vision-deepresearch:+Incentivizing+deepresearch+capability+in+multimodal+large+language+models 16. Momentor: Advancing video large language model with fine-grained temporal reasoning — not confirmed in provided snippet, recent (not confirmed from snippet) https://scholar.google.com/scholar?q=Momentor:+Advancing+video+large+language+model+with+fine-grained+temporal+reasoning 17. Temporal reasoning transfer from text to video — not confirmed in provided snippet, recent (not confirmed from snippet) https://scholar.google.com/scholar?q=Temporal+reasoning+transfer+from+text+to+video 18. Videoinsta: Zero-shot long video understanding via informative spatial-temporal reasoning with LLMs — not confirmed in provided snippet, recent (not confirmed from snippet) https://scholar.google.com/scholar?q=Videoinsta:+Zero-shot+long+video+understanding+via+informative+spatial-temporal+reasoning+with+LLMs 19. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 20. AI Post Transformers: Experimental Comparison of Agentic and Enhanced RAG — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-experimental-comparison-of-agentic-and-e-37d8bc.mp3 21. A

  5. 1D AGO

    Muon Is Scalable for LLM Training

    This episode explores whether the Muon optimizer can truly scale from promising small experiments to frontier-style language model training, and what it would mean if its reported roughly 2x compute-efficiency gain over AdamW holds up. It examines how Muon differs from standard optimizer setups by applying momentum plus orthogonalization to matrix-shaped hidden-layer weights while keeping embeddings and one-dimensional parameters on AdamW, making the method a deliberately hybrid training recipe rather than a pure drop-in replacement. The discussion digs into the paper’s central technical claims around scaling laws, optimizer attribution, and systems implementation, with particular attention to evidence that added weight decay and per-parameter update scaling are essential for long-run stability and preventing runaway magnitudes in bf16 training. A listener would find it interesting because the conversation goes beyond headline gains to ask whether Muon really shifts the compute-optimal frontier for LLM training or whether the result depends on a carefully engineered package whose practical value lies in how all the pieces work together. Sources: 1. Muon is Scalable for LLM Training — Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang, Yuxin Wu, Xinyu Zhou, Zhilin Yang, 2025 http://arxiv.org/abs/2502.16982 2. Muon: An optimizer for hidden layers in neural networks — Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Franz Cesista, Laker Newhouse, Jeremy Bernstein, 2024 https://scholar.google.com/scholar?q=Muon:+An+optimizer+for+hidden+layers+in+neural+networks 3. Training Compute-Optimal Large Language Models — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford and others, 2022 https://scholar.google.com/scholar?q=Training+Compute-Optimal+Large+Language+Models 4. Scalable Second Order Optimization for Deep Learning — Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer, 2020 https://scholar.google.com/scholar?q=Scalable+Second+Order+Optimization+for+Deep+Learning 5. A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks at-Scale — Hao-Jun Michael Shi, Erik M. Nitanda, Animesh Garg, Rohan Anil and others, 2023 https://scholar.google.com/scholar?q=A+Distributed+Data-Parallel+PyTorch+Implementation+of+the+Distributed+Shampoo+Optimizer+for+Training+Neural+Networks+at-Scale 6. Modular Duality in Deep Learning — Jeremy Bernstein, Laker Newhouse and collaborators, 2024 https://scholar.google.com/scholar?q=Modular+Duality+in+Deep+Learning 7. Adam-mini: Use Fewer Learning Rates To Gain More — Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun, 2024 https://scholar.google.com/scholar?q=Adam-mini:+Use+Fewer+Learning+Rates+To+Gain+More 8. Small-scale proxies for large-scale transformer training instabilities — approx. recent transformer-scaling/optimization authors, recent https://scholar.google.com/scholar?q=Small-scale+proxies+for+large-scale+transformer+training+instabilities 9. An Empirical Study of μP Learning Rate Transfer — approx. μP / maximal-update-parameterization authors, recent https://scholar.google.com/scholar?q=An+Empirical+Study+of+μP+Learning+Rate+Transfer 10. DeepNet: Scaling Transformers to 1,000 Layers — approx. DeepNet authors, 2022 https://scholar.google.com/scholar?q=DeepNet:+Scaling+Transformers+to+1,000+Layers 11. Why Do We Need Weight Decay in Modern Deep Learning? — approx. recent optimization authors, recent https://scholar.google.com/scholar?q=Why+Do+We+Need+Weight+Decay+in+Modern+Deep+Learning? 12. Orthogonal Subspace Learning for Language Model Continual Learning — approx. continual-learning / language-model authors, recent https://scholar.google.com/scholar?q=Orthogonal+Subspace+Learning+for+Language+Model+Continual+Learning 13. AI Post Transformers: When Spectral Gradient Updates Help Deep Learning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-when-spectral-gradient-updates-help-deep-9c8441.mp3 Interactive Visualization: Muon Is Scalable for LLM Training

  6. 1D AGO

    ScoutAttention for Efficient KV Cache Offloading

    This episode explores ScoutAttention, a systems paper on speeding up long-context LLM inference by managing KV cache growth more intelligently instead of treating it as a pure memory-capacity problem. It explains why long prompts, retrieval-heavy inputs, and extended reasoning traces make decoding increasingly constrained by memory traffic, and why simply offloading cache data to CPU memory can still leave GPUs stalled. The discussion focuses on the paper’s core idea: keep dense, high-speed attention on GPU, let the CPU handle only a pruned sparse set of offloaded KV blocks, and use layer-ahead CPU pre-computation plus asynchronous overlap to reduce waiting. Listeners would find it interesting because it frames transformer inference as a hardware scheduling problem and shows how throughput gains can come from smarter coordination between GPU compute, CPU compute, and data movement rather than from changing the model itself. Sources: 1. ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference — Qiuyang Zhang, Kai Zhou, Ding Tang, Kai Lu, Cheng Li, Zhenyu Yang, Peng Xu, Jiguang Wan, 2026 http://arxiv.org/abs/2603.27138 2. InfiniGen — Not specified in the provided excerpt, Not specified in the provided excerpt https://scholar.google.com/scholar?q=InfiniGen 3. HGCA — Not specified in the provided excerpt, Not specified in the provided excerpt https://scholar.google.com/scholar?q=HGCA 4. OpenAI o1 — OpenAI, Not specified in the provided excerpt https://scholar.google.com/scholar?q=OpenAI+o1 5. DeepSeek-R1 — DeepSeek, Not specified in the provided excerpt https://scholar.google.com/scholar?q=DeepSeek-R1 6. Retrieval-Augmented Generation — Not specified in the provided excerpt, Not specified in the provided excerpt https://scholar.google.com/scholar?q=Retrieval-Augmented+Generation 7. KV Cache Offloading for Context-Intensive Tasks — Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov, 2026 https://scholar.google.com/scholar?q=KV+Cache+Offloading+for+Context-Intensive+Tasks 8. In-context KV-Cache Eviction for LLMs via Attention-Gate — Zihao Zeng, Bokai Lin, Tianqi Hou, Hao Zhang, Zhijie Deng, 2024 https://scholar.google.com/scholar?q=In-context+KV-Cache+Eviction+for+LLMs+via+Attention-Gate 9. LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation — Jinwoo Ahn, Ingyu Seong, Akhil Kedia, Junhan Kim, Hyemi Jang, Kangwook Lee, Yongkweon Jeon, 2026 https://scholar.google.com/scholar?q=LookaheadKV:+Fast+and+Accurate+KV+Cache+Eviction+by+Glimpsing+into+the+Future+without+Generation 10. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention — Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu, 2024 https://scholar.google.com/scholar?q=MInference+1.0:+Accelerating+Pre-filling+for+Long-Context+LLMs+via+Dynamic+Sparse+Attention 11. SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention — Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang, 2024 https://scholar.google.com/scholar?q=SampleAttention:+Near-Lossless+Acceleration+of+Long+Context+LLM+Inference+with+Adaptive+Structured+Sparse+Attention 12. FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference — Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou, 2025 https://scholar.google.com/scholar?q=FlexPrefill:+A+Context-Aware+Sparse+Attention+Mechanism+for+Efficient+Long-Sequence+Inference 13. MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool — Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan, 2024 https://scholar.google.com/scholar?q=MemServe:+Context+Caching+for+Disaggregated+LLM+Serving+with+Elastic+Memory+Pool 14. Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads — Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan, 2024 https://scholar.google.com/scholar?q=Inference+without+Interference:+Disaggregate+LLM+Inference+for+Mixed+Downstream+Workloads 15. AI Post Transformers: RetrievalAttention for Long-Context LLM Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-17-retrievalattention-for-long-context-llm-ddf566.mp3 16. AI Post Transformers: Lookahead Q-Cache for Consistent KV Eviction — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-lookahead-q-cache-for-consistent-kv-evic-d97b09.mp3 17. AI Post Transformers: KVSwap for Disk-Aware Long-Context On-Device Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-16-kvswap-for-disk-aware-long-context-on-de-f3c15e.mp3 18. AI Post Transformers: ContiguousKV for Faster LLM Prefill KV Reuse — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-contiguouskv-for-faster-llm-prefill-kv-r-59f545.mp3 19. AI Post Transformers: Prefill-as-a-Service for Cross-Datacenter KV Cache — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-prefill-as-a-service-for-cross-datacente-7560be.mp3 20. AI Post Transformers: FengHuang for Rack-Scale LLM Inference Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-12-fenghuang-for-rack-scale-llm-inference-m-62708e.mp3 Interactive Visualization: ScoutAttention for Efficient KV Cache Offloading

  7. 3D AGO

    Agentic Aggregation for Long-Horizon AI Tasks

    This episode explores a Princeton paper on whether multiple long-running, tool-using AI agent trajectories can be combined more effectively by an “aggregator agent” that selectively inspects the full traces, rather than by simple answer voting or compressed summaries. It explains why aggregation gets much harder for long-horizon agentic tasks like web research, navigation, and software repair, where useful evidence is scattered across search queries, tool calls, observations, and partial plans instead of ending in a neat final answer. The discussion situates the work against self-consistency, repeated sampling, ReAct, and Tree of Thoughts, arguing that the real novelty is not parallel rollouts themselves but how to reason over archived trajectories after the runs are complete. Listeners would find it interesting because it gets at a practical bottleneck in scaling AI performance at inference time: where extra compute should be spent, and how to recover the one crucial clue buried inside a pile of messy agent logs. Sources: 1. Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks — Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen, 2026 http://arxiv.org/abs/2604.11753 2. Self-Consistency Improves Chain of Thought Reasoning in Language Models — Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou, 2023 https://scholar.google.com/scholar?q=Self-Consistency+Improves+Chain+of+Thought+Reasoning+in+Language+Models 3. Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas Griffiths, Yuan Cao, Karthik Narasimhan, 2023 https://scholar.google.com/scholar?q=Tree+of+Thoughts:+Deliberate+Problem+Solving+with+Large+Language+Models 4. Large Language Monkeys: Scaling Inference Compute with Repeated Sampling — Charlie Snell and collaborators, 2024 https://scholar.google.com/scholar?q=Large+Language+Monkeys:+Scaling+Inference+Compute+with+Repeated+Sampling 5. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — Anonymous/OpenAI-aligned line of work often associated with inference scaling discussions; exact authorship depends on version, 2024 https://scholar.google.com/scholar?q=Scaling+LLM+Test-Time+Compute+Optimally+can+be+More+Effective+than+Scaling+Model+Parameters 6. ReAct: Synergizing Reasoning and Acting in Language Models — Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao, 2023 https://scholar.google.com/scholar?q=ReAct:+Synergizing+Reasoning+and+Acting+in+Language+Models 7. WebArena: A Realistic Web Environment for Building Autonomous Agents — Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, et al., 2024 https://scholar.google.com/scholar?q=WebArena:+A+Realistic+Web+Environment+for+Building+Autonomous+Agents 8. GAIA: a benchmark for General AI Assistants — Grégoire Mialon and collaborators, 2023 https://scholar.google.com/scholar?q=GAIA:+a+benchmark+for+General+AI+Assistants 9. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — John Yang, Carlos E. Jimenez, Alexander Wettig, Shiyue Deng, et al., 2024 https://scholar.google.com/scholar?q=SWE-bench:+Can+Language+Models+Resolve+Real-World+GitHub+Issues? 10. Toolformer: Language Models Can Teach Themselves to Use Tools — Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Jason Weston, Mike Lewis, 2023 https://scholar.google.com/scholar?q=Toolformer:+Language+Models+Can+Teach+Themselves+to+Use+Tools 11. MRKL Systems: A Modular, Neuro-Symbolic Architecture That Combines Large Language Models, External Knowledge Sources and Discrete Reasoning — A. Karpas, Y. Levine, Y. M. Jang, et al., 2022 https://scholar.google.com/scholar?q=MRKL+Systems:+A+Modular,+Neuro-Symbolic+Architecture+That+Combines+Large+Language+Models,+External+Knowledge+Sources+and+Discrete+Reasoning 12. Gorilla: Large Language Model Connected with Massive APIs — Patil, Zhang, Wang, et al., 2023 https://scholar.google.com/scholar?q=Gorilla:+Large+Language+Model+Connected+with+Massive+APIs 13. Best-of-N Test-Time Scaling — Charlie Snell, et al., 2025 https://scholar.google.com/scholar?q=Best-of-N+Test-Time+Scaling 14. Inference-Time Scaling for Generalist Reward Modeling / Search-based test-time scaling works cited as Brown et al. 2024, Welleck et al. 2024, Muennighoff et al. 2025, Zhao et al. 2025 — Various, 2024-2025 https://scholar.google.com/scholar?q=Inference-Time+Scaling+for+Generalist+Reward+Modeling+/+Search-based+test-time+scaling+works+cited+as+Brown+et+al.+2024,+Welleck+et+al.+2024,+Muennighoff+et+al.+2025,+Zhao+et+al.+2025 15. BrowseComp — Jason Wei, et al., 2025 https://scholar.google.com/scholar?q=BrowseComp 16. HLE — Phan, et al., 2025 https://scholar.google.com/scholar?q=HLE 17. WebDancer or WebWalker-style web navigation/agent benchmarks and newer deep research benchmarks such as DeepResearch Bench — Various, 2024-2026 https://scholar.google.com/scholar?q=WebDancer+or+WebWalker-style+web+navigation/agent+benchmarks+and+newer+deep+research+benchmarks+such+as+DeepResearch+Bench 18. Reflexion: Language Agents with Verbal Reinforcement Learning — Noah Shinn, Federico Cassano, et al., 2023 https://scholar.google.com/scholar?q=Reflexion:+Language+Agents+with+Verbal+Reinforcement+Learning 19. Language Agent Tree Search / Planning with MCTS-style LLM agents — Various, 2023-2025 https://scholar.google.com/scholar?q=Language+Agent+Tree+Search+/+Planning+with+MCTS-style+LLM+agents 20. iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference — approx. 2025 multi-agent debate authors, 2025 https://scholar.google.com/scholar?q=iMAD:+Intelligent+Multi-Agent+Debate+for+Efficient+and+Accurate+LLM+Inference 21. GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion — approx. 2024/2025 multi-agent debate authors, 2024/2025 https://scholar.google.com/scholar?q=GroupDebate:+Enhancing+the+Efficiency+of+Multi-Agent+Debate+Using+Group+Discussion 22. Improving Multi-Agent Debate with Sparse Communication Topology — approx. 2024/2025 multi-agent debate authors, 2024/2025 https://scholar.google.com/scholar?q=Improving+Multi-Agent+Debate+with+Sparse+Communication+Topology 23. VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation — approx. 2025 verification/safety authors, 2025 https://scholar.google.com/scholar?q=VeriGuard:+Enhancing+LLM+Agent+Safety+via+Verified+Code+Generation 24. Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems — approx. 2025 agent verification authors, 2025 https://scholar.google.com/scholar?q=Verifiability-First+Agents:+Provable+Observability+and+Lightweight+Audit+Agents+for+Controlling+Autonomous+LLM+Systems 25. AI Post Transformers: DeepResearch Arena: Benchmarking LLMs' Research Abilities — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/deepresearch-arena-benchmarking-llms-research-abilities/ 26. AI Post Transformers: Experimental Comparison of Agentic and Enhanced RAG — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-experimental-comparison-of-agentic-and-e-37d8bc.mp3 27. AI Post Transformers: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/adaptive-test-time-scaling-with-world-models-for-visual-spatial-reasoning/ 28. AI Post Transformers: Generalist Reward Modeling with Inference-Time Scaling — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/generalist-reward-modeling-with-inference-time-scaling/ 29. AI Post Transformers: Bloom: an open source tool for automated behavioral evaluations — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/bloom-an-open-source-tool-for-automated-behavioral-evaluations/ Interactive Visualization: Agentic Aggregation for Long-Horizon AI Tasks

  8. 3D AGO

    TokenDance for Multi-Agent KV Cache Sharing

    This episode explores TokenDance, a systems approach for serving many LLM-based agents more efficiently by collectively sharing transformer KV caches across synchronized conversation rounds. It explains why multi-agent workloads are fundamentally different from ordinary chat serving: agents persist across rounds, accumulate large KV caches, and often follow an “all-gather” pattern where each agent receives a mostly shared prompt plus its own private history, making standard prefix-based cache reuse ineffective. The discussion argues that the key innovation is shifting cache reuse from individual requests to the entire round of agents as a collective object, enabling memory savings and better scalability on the same GPU. Listeners interested in agent systems, inference infrastructure, and practical bottlenecks beyond model architecture will find it compelling for its concrete diagnosis of memory management as the real constraint. Sources: 1. TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing — Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, Youwei Zhuo, 2026 http://arxiv.org/abs/2604.03143 2. TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing — Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, Youwei Zhuo, 2026 https://scholar.google.com/scholar?q=TokenDance:+Scaling+Multi-Agent+LLM+Serving+via+Collective+KV+Cache+Sharing 3. Efficient Memory Management for Large Language Model Serving with PagedAttention — Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Hao Zhang, et al., 2023 https://scholar.google.com/scholar?q=Efficient+Memory+Management+for+Large+Language+Model+Serving+with+PagedAttention 4. SGLang: Efficient Execution of Structured Language Model Programs — Lianmin Zheng, Weizhe Chen, Ying Sheng, Tianqi Chen, Ion Stoica, and collaborators, 2024 https://scholar.google.com/scholar?q=SGLang:+Efficient+Execution+of+Structured+Language+Model+Programs 5. Parrot: Efficient Serving of LLM-based Applications with Semantic Variable — Xiangyao Yu and collaborators, 2024 https://scholar.google.com/scholar?q=Parrot:+Efficient+Serving+of+LLM-based+Applications+with+Semantic+Variable 6. vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, et al., 2023 https://scholar.google.com/scholar?q=vLLM:+Easy,+Fast,+and+Cheap+LLM+Serving+with+PagedAttention 7. SGLang — SGLang team / related authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=SGLang 8. Parrot — Authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=Parrot 9. Autellix — Authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=Autellix 10. Tokencake — Authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=Tokencake 11. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph O'Brien, Carrie Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://scholar.google.com/scholar?q=Generative+Agents:+Interactive+Simulacra+of+Human+Behavior 12. Position-independent KV-cache reuse papers cited as [10, 34-36] — Authors as cited in the paper, 2024-2026 https://scholar.google.com/scholar?q=Position-independent+KV-cache+reuse+papers+cited+as+[10,+34-36] 13. OpenClaw — Authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=OpenClaw 14. MoltBook — Authors as cited in the paper, 2024 https://scholar.google.com/scholar?q=MoltBook 15. DynTaskMAS: A Dynamic Task Graph-Driven Framework for Asynchronous and Parallel LLM-Based Multi-Agent Systems — approx. recent multi-agent systems authors, 2024/2025 https://scholar.google.com/scholar?q=DynTaskMAS:+A+Dynamic+Task+Graph-Driven+Framework+for+Asynchronous+and+Parallel+LLM-Based+Multi-Agent+Systems 16. Kairos: Low-Latency Multi-Agent Serving with Shared LLMs and Excessive Loads in the Public Cloud — approx. recent systems authors, 2024/2025 https://scholar.google.com/scholar?q=Kairos:+Low-Latency+Multi-Agent+Serving+with+Shared+LLMs+and+Excessive+Loads+in+the+Public+Cloud 17. CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving — approx. recent LLM serving authors, 2024/2025 https://scholar.google.com/scholar?q=CacheSlide:+Unlocking+Cross+Position-Aware+KV+Cache+Reuse+for+Accelerating+LLM+Serving 18. Where Matters More Than What: Decoding-Aligned KV Cache Compression via Position-Aware Pseudo Queries — approx. recent KV compression authors, 2024/2025 https://scholar.google.com/scholar?q=Where+Matters+More+Than+What:+Decoding-Aligned+KV+Cache+Compression+via+Position-Aware+Pseudo+Queries 19. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse — approx. recent KV reuse authors, 2024/2025 https://scholar.google.com/scholar?q=KVLink:+Accelerating+Large+Language+Models+via+Efficient+KV+Cache+Reuse 20. HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse — approx. recent RAG authors, 2024/2025 https://scholar.google.com/scholar?q=HyperRAG:+Enhancing+Quality-Efficiency+Tradeoffs+in+Retrieval-Augmented+Generation+with+Reranker+KV-Cache+Reuse 21. ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation — approx. recent RAG/KV authors, 2024/2025 https://scholar.google.com/scholar?q=ProphetKV:+User-Query-Driven+Selective+Recomputation+for+Efficient+KV+Cache+Reuse+in+Retrieval-Augmented+Generation 22. Eigen Attention: Attention in Low-Rank Space for KV Cache Compression — approx. recent KV compression authors, 2024/2025 https://scholar.google.com/scholar?q=Eigen+Attention:+Attention+in+Low-Rank+Space+for+KV+Cache+Compression 23. PALU: KV-Cache Compression with Low-Rank Projection — approx. recent systems/ML authors, 2024/2025 https://scholar.google.com/scholar?q=PALU:+KV-Cache+Compression+with+Low-Rank+Projection 24. LORC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy — approx. recent KV compression authors, 2024/2025 https://scholar.google.com/scholar?q=LORC:+Low-Rank+Compression+for+LLMs+KV+Cache+with+a+Progressive+Compression+Strategy 25. AI Post Transformers: CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-16-cacheslide-position-aware-kv-cache-reuse-cd59c7.mp3 26. AI Post Transformers: ContiguousKV for Faster LLM Prefill KV Reuse — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-contiguouskv-for-faster-llm-prefill-kv-r-59f545.mp3 27. AI Post Transformers: KV Cache TTL for Multi-Turn Agent Scheduling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-kv-cache-ttl-for-multi-turn-agent-schedu-996bf1.mp3 28. AI Post Transformers: Continuous Batching for LLM Inference: Throughput and Latency Gains — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/continuous-batching-for-llm-inference-throughput-and-latency-gains/ 29. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 30. AI Post Transformers: Splitwise: Phase-Split LLM Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-26-splitwise-phase-split-llm-inference-e8945b.mp3 31. AI Post Transformers: FengHuang for Rack-Scale LLM Inference Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-12-fenghuang-for-rack-scale-llm-inference-m-62708e.mp3 32. AI Post Transformers: From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-22-from-prefix-cache-to-fusion-rag-9c5d39.mp3 Interactive Visualization: TokenDance for Multi-Agent KV Cache Sharing

Ratings & Reviews

3.7
out of 5
3 Ratings

About

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

You Might Also Like