AI Post Transformers

mcgrof

3.7 (3)
Technology
Updated Daily

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

1d ago

Atlas: Test-Time Memory for Long Contexts

This episode explores Atlas, a 2025 paper on test-time memorization that asks whether a model with fixed recurrent memory can learn to update that memory during inference and rival Transformers on long-context recall and reasoning. It explains the core tradeoff between Transformer-style KV caches, which preserve near-exact token access at growing cost, and bounded recurrent memory, which must decide what to keep, compress, or forget. The discussion focuses on why earlier recurrent memory systems fell short, then breaks down Atlas's proposed fixes: evaluating memory updates against a window of recent tokens rather than only the newest token, using richer key representations, and learning stronger retention and optimizer-style write rules. Listeners get a clear view of why this matters for post-Transformer architectures, and why fixed-size memory remains both a promising direction and a stubborn bottleneck. Sources: 1. ATLAS: Learning to Optimally Memorize the Context at Test Time — Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni, 2025 http://arxiv.org/abs/2505.23735 2. Linear Transformers Are Secretly Fast Weight Programmers — Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber, 2021 https://scholar.google.com/scholar?q=Linear+Transformers+Are+Secretly+Fast+Weight+Programmers 3. Retentive Network: A Successor to Transformer for Large Language Models — Yutao Sun, Li Dong, Shaohan Huang, Furu Wei, et al., 2023 https://scholar.google.com/scholar?q=Retentive+Network:+A+Successor+to+Transformer+for+Large+Language+Models 4. Titans: Learning to Memorize at Test Time — Ali Behrouz, Peilin Zhong, Vahab Mirrokni, 2025 https://scholar.google.com/scholar?q=Titans:+Learning+to+Memorize+at+Test+Time 5. It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization — Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni, 2025 https://scholar.google.com/scholar?q=It's+All+Connected:+A+Journey+Through+Test-Time+Memorization,+Attentional+Bias,+Retention,+and+Online+Optimization 6. Learning to (Learn at Test Time): RNNs with Expressive Hidden States — Yu Sun et al., 2024 https://scholar.google.com/scholar?q=Learning+to+(Learn+at+Test+Time):+RNNs+with+Expressive+Hidden+States 7. RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval — Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, 2024 https://scholar.google.com/scholar?q=RNNs+are+not+Transformers+(Yet):+The+Key+Bottleneck+on+In-Context+Retrieval 8. Test-time Regression: a Unifying Framework for Designing Sequence Models with Associative Memory — Ke Alexander Wang, Jiaxin Shi, Emily B. Fox, 2025 https://scholar.google.com/scholar?q=Test-time+Regression:+a+Unifying+Framework+for+Designing+Sequence+Models+with+Associative+Memory 9. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack — Yuri Kuratov et al., 2024 https://scholar.google.com/scholar?q=BABILong:+Testing+the+Limits+of+LLMs+with+Long+Context+Reasoning-in-a-Haystack 10. SCBench: A KV Cache-Centric Analysis of Long-Context Methods — Yucheng Li et al., 2024 https://scholar.google.com/scholar?q=SCBench:+A+KV+Cache-Centric+Analysis+of+Long-Context+Methods 11. Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks — Zheng Wang et al., 2024 https://scholar.google.com/scholar?q=Model+Tells+You+Where+to+Merge:+Adaptive+KV+Cache+Merging+for+LLMs+on+Long-Context+Tasks 12. Forgetting Transformer: Softmax Attention with a Forget Gate — Zhixuan Lin et al., 2025 https://scholar.google.com/scholar?q=Forgetting+Transformer:+Softmax+Attention+with+a+Forget+Gate 13. Test-Time Training Done Right — Tianyuan Zhang et al., 2025 https://scholar.google.com/scholar?q=Test-Time+Training+Done+Right 14. Associative Recurrent Memory Transformer — Ivan Rodkin et al., 2024 https://scholar.google.com/scholar?q=Associative+Recurrent+Memory+Transformer 15. AI Post Transformers: Titans: Learning to Memorize at Test Time — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-20-titans-learning-to-memorize-at-test-time-054662.mp3 16. AI Post Transformers: Gated Delta Networks for Long-Context Retrieval — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-17-gated-delta-networks-for-long-context-re-706d85.mp3 17. AI Post Transformers: Parallelizing DeltaNet Linear Transformers over Sequence Length — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-18-parallelizing-deltanet-linear-transforme-2d0377.mp3 18. AI Post Transformers: Gated Linear Attention for Efficient Long Sequences — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-18-gated-linear-attention-for-efficient-lon-c858ab.mp3 19. AI Post Transformers: Muon Is Scalable for LLM Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-muon-is-scalable-for-llm-training-587ed8.mp3 20. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 Interactive Visualization: Atlas: Test-Time Memory for Long Contexts
1d ago

Do Transformers Need Three Projections?

This episode explores whether transformers really need separate query, key, and value projections, treating the problem as weight tying inside attention rather than as a brand-new model design. It explains why KV-cache size and memory bandwidth are major bottlenecks for long-context, on-device decoding, then compares increasingly aggressive sharing schemes, especially the difference between tying keys and values versus tying queries and keys. The discussion emphasizes that the broader sweep happens at 300M parameters, while only the shared-K/V variant is carried to 1.2B scale and remains in contention against practical baselines like grouped-query and multi-query attention. Listeners get a concrete deployment tradeoff: shared K/V can reduce KV-cache memory by about 50 percent at roughly a 3.1 percent perplexity cost, making the episode especially interesting for anyone focused on efficient inference. Sources: 1. Do Transformers Need Three Projections? Systematic Study of QKV Variants — Ali Kayyam, Anusha Madan Gopal, M Anthony Lewis, 2026 http://arxiv.org/abs/2606.04032v2 2. Attention Is All You Need — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, 2017 https://arxiv.org/abs/1706.03762 3. Fast Transformer Decoding: One Write-Head is All You Need — Noam Shazeer, 2019 https://arxiv.org/abs/1911.02150 4. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints — Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai, 2023 https://arxiv.org/abs/2305.13245 5. Do Transformers Need Three Projections? Systematic Study of QKV Variants — Ali Kayyam, Anusha Madan Gopal, M. Anthony Lewis, 2026 https://arxiv.org/abs/2606.04032 6. Using the Output Embedding to Improve Language Models — Ofir Press, Lior Wolf, 2017 https://arxiv.org/abs/1608.05859 7. Linformer: Self-Attention with Linear Complexity — Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma, 2020 https://scholar.google.com/scholar?q=Linformer:+Self-Attention+with+Linear+Complexity 8. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention — Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, Francois Fleuret, 2020 https://scholar.google.com/scholar?q=Transformers+are+RNNs:+Fast+Autoregressive+Transformers+with+Linear+Attention 9. Mamba: Linear-Time Sequence Modeling with Selective State Spaces — Albert Gu, Tri Dao, 2023 https://scholar.google.com/scholar?q=Mamba:+Linear-Time+Sequence+Modeling+with+Selective+State+Spaces 10. AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations — Qian Tao et al., 2024 https://arxiv.org/abs/2410.13212 11. LongHeads: Multi-Head Attention is Secretly a Long Context Processor — Yi Lu et al., 2024 https://arxiv.org/abs/2402.10685 12. MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads — Weihao Liu et al., 2025 https://arxiv.org/abs/2502.13963 13. Squeezed Attention: Accelerating Long Context Length LLM Inference — Coleman Hooper et al., 2024 https://arxiv.org/abs/2411.09688 14. Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention — Zohaib Khan et al., 2024 https://arxiv.org/abs/2408.08454 15. Weight Decay Induces Low-Rank Attention Layers — Seijin Kobayashi et al., 2024 https://arxiv.org/abs/2410.23819 16. Dissecting Query-Key Interaction in Vision Transformers — Xu Pan et al., 2024 https://arxiv.org/abs/2405.14880 17. AI Post Transformers: Affordable Large-Scale Decoding Through Model-System Co-Design — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-19-affordable-large-scale-decoding-through-e1d7ed.mp3 18. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3 19. AI Post Transformers: Prefill-as-a-Service for Cross-Datacenter KV Cache — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-prefill-as-a-service-for-cross-datacente-7560be.mp3 20. AI Post Transformers: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-turboquant-online-vector-quantiz-1967b7.mp3 21. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 Interactive Visualization: Do Transformers Need Three Projections?
1d ago

KumoRFM for In-Context Relational Learning

This episode explores KumoRFM, a 2025 proposal for a foundation model that can perform in-context learning directly on relational databases, aiming to handle tasks like churn prediction, fraud detection, recommendation, and forecasting without training a separate model for each schema and label. It explains how the approach represents warehouse data as heterogeneous graphs of rows and foreign-key relationships, using attention over local relational neighborhoods instead of flattening everything into handcrafted feature tables. The discussion focuses on the paper’s strongest claim, zero-shot transfer, and carefully separates true inference-time generalization from easier settings like continued pretraining on the target database or later fine-tuning on the target task. Listeners would find it interesting because the episode gets precise about what this system could change in enterprise ML, while also surfacing the practical caveats around task specification, temporal leakage, infrastructure cost, and how literal the any database, any task promise really is. Sources: 1. KumoRFM for In-Context Relational Learning https://kumo.ai/research/kumo_relational_foundation_model.pdf 2. Modeling Relational Data with Graph Convolutional Networks — Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Max Welling, et al., 2017 https://scholar.google.com/scholar?q=Modeling+Relational+Data+with+Graph+Convolutional+Networks 3. Heterogeneous Graph Transformer — Ziniu Hu, Yuxiao Dong, Kuansan Wang, Yizhou Sun, 2020 https://scholar.google.com/scholar?q=Heterogeneous+Graph+Transformer 4. Relational Deep Learning: Graph Representation Learning on Relational Databases — Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Jure Leskovec, et al., 2023 https://scholar.google.com/scholar?q=Relational+Deep+Learning:+Graph+Representation+Learning+on+Relational+Databases 5. Relational Graph Transformer — Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico Lopez, Matthias Fey, Jure Leskovec, et al., 2025 (ICLR 2026) https://scholar.google.com/scholar?q=Relational+Graph+Transformer 6. One Model to Rule them All: Towards Zero-Shot Learning for Databases — Benjamin Hilprecht, Carsten Binnig, 2021 https://scholar.google.com/scholar?q=One+Model+to+Rule+them+All:+Towards+Zero-Shot+Learning+for+Databases 7. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction — Benjamin Hilprecht, Carsten Binnig, 2022 https://scholar.google.com/scholar?q=Zero-Shot+Cost+Models+for+Out-of-the-box+Learned+Cost+Prediction 8. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second — Noah Hollmann, Samuel Muller, Katharina Eggensperger, Frank Hutter, 2022 (ICLR 2023) https://scholar.google.com/scholar?q=TabPFN:+A+Transformer+That+Solves+Small+Tabular+Classification+Problems+in+a+Second 9. Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data — Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Carlos Guestrin, Jure Leskovec, et al., 2025 (ICLR 2026) https://scholar.google.com/scholar?q=Relational+Transformer:+Toward+Zero-Shot+Foundation+Models+for+Relational+Data 10. Relational In-Context Learning via Synthetic Pre-training with Structural Prior — Yanbo Wang, Jiaxuan You, Chuan Shi, Muhan Zhang, 2026 https://scholar.google.com/scholar?q=Relational+In-Context+Learning+via+Synthetic+Pre-training+with+Structural+Prior 11. OpenRFM: Dissecting Relational In-Context Learning — Zhikai Chen et al., 2026 https://scholar.google.com/scholar?q=OpenRFM:+Dissecting+Relational+In-Context+Learning 12. KumoRFM-2: Scaling Foundation Models for Relational Learning — Valter Hudovernik et al., 2026 https://scholar.google.com/scholar?q=KumoRFM-2:+Scaling+Foundation+Models+for+Relational+Learning 13. Retrieval & Fine-Tuning for In-Context Tabular Models — Valentin Thomas et al., 2024 https://scholar.google.com/scholar?q=Retrieval+&+Fine-Tuning+for+In-Context+Tabular+Models 14. Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models — Xumeng Wen et al., 2025 https://scholar.google.com/scholar?q=Scalable+In-Context+Learning+on+Tabular+Data+via+Retrieval-Augmented+Large+Language+Models 15. Exploring Fine-Tuning for Tabular Foundation Models — Aditya Tanna et al., 2026 https://scholar.google.com/scholar?q=Exploring+Fine-Tuning+for+Tabular+Foundation+Models 16. AI Post Transformers: Predictive Query Language for Relational Databases — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-08-predictive-query-language-for-relational-103e68.mp3 17. AI Post Transformers: KumoRFM-2 for Relational Learning at Scale — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-08-kumorfm-2-for-relational-learning-at-sca-13c996.mp3 18. AI Post Transformers: Why LightGBM Made Boosted Trees Fast — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-why-lightgbm-made-boosted-trees-fast-286a89.mp3 19. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3
1d ago

Learning at Test Time with Expressive RNN States

This episode explores the paper Learning to (Learn at Test Time): RNNs with Expressive Hidden States and its attempt to give recurrent models transformer-like long-context behavior without the quadratic cost of attention. It explains why standard RNN hidden states are a bottleneck, compares that limitation to transformers’ growing KV cache, and highlights a key empirical motivation: in the paper’s setup, Mamba’s token-level perplexity improvements flatten around 16k tokens while transformers keep improving deeper into a 32k context. The discussion focuses on the paper’s core idea of test-time training, where the hidden state is treated as a small inner model whose parameters are updated online with a self-supervised learning rule, rather than as a fixed vector summary. Listeners would find it interesting because it connects old fast-weights and dynamic-evaluation ideas to a new systems-level proposal for long-context efficiency, while also noting the open question of whether better perplexity truly translates into stronger retrieval and reasoning. Sources: 1. Learning to (Learn at Test Time): RNNs with Expressive Hidden States — Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin, 2024 http://arxiv.org/abs/2407.04620 2. Using Fast Weights to Attend to the Recent Past — Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu, 2016 https://arxiv.org/abs/1610.06258 3. Linear Transformers Are Secretly Fast Weight Programmers — Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber, 2021 https://arxiv.org/abs/2102.11174 4. Mamba: Linear-Time Sequence Modeling with Selective State Spaces — Albert Gu, Tri Dao, 2023 https://arxiv.org/abs/2312.00752 5. Learning to (Learn at Test Time): RNNs with Expressive Hidden States — Yu Sun, Xinhao Li, Karan Dalal, Xiaolong Wang, Tatsunori Hashimoto, Carlos Guestrin, 2024 https://arxiv.org/abs/2407.04620 6. Dynamic Evaluation of Transformer Language Models — Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals, 2019 https://arxiv.org/abs/1904.08378 7. Effective Long-Context Scaling of Foundation Models — Wenhan Xiong et al., 2023 https://arxiv.org/abs/2309.16039 8. Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models — Soham De et al., 2024 https://arxiv.org/abs/2402.19427 9. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality — Tri Dao, Albert Gu, 2024 https://arxiv.org/abs/2405.21060 10. An Empirical Study of Mamba-based Language Models — Roger Waleffe et al., 2024 https://arxiv.org/abs/2406.07887 11. KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction — Jang-Hyun Kim et al., 2025 https://arxiv.org/abs/2505.23416 12. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse — Jingbo Yang et al., 2025 https://arxiv.org/abs/2502.16002 13. ReMamba: Equip Mamba with Effective Long-Sequence Modeling — Danlong Yuan et al., 2024 https://arxiv.org/abs/2408.15496 14. LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement — Zhifan Ye et al., 2025 https://arxiv.org/abs/2504.16053 15. Fast-weight Product Key Memory — Tianyu Zhao, Llion Jones, 2026 https://arxiv.org/abs/2601.00671 16. Test-Time Learning for Large Language Models — Jinwu Hu et al., 2025 https://arxiv.org/abs/2505.20633 17. Test-Time Training on Nearest Neighbors for Large Language Models — Moritz Hardt, Yu Sun, 2023 https://arxiv.org/abs/2305.18466 18. AI Post Transformers: Titans: Learning to Memorize at Test Time — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-20-titans-learning-to-memorize-at-test-time-054662.mp3 19. AI Post Transformers: Gated Linear Attention for Efficient Long Sequences — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-18-gated-linear-attention-for-efficient-lon-c858ab.mp3 20. AI Post Transformers: Kimi Linear: Efficient Expressive Attention Architecture — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/kimi-linear-efficient-expressive-attention-architecture/ 21. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 22. AI Post Transformers: Explicit Information Transmission for Context Compression — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-explicit-information-transmission-for-co-24e3c2.mp3 23. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3 Interactive Visualization: Learning at Test Time with Expressive RNN States
1d ago

Robots Need More Than VLAs and World Models

This episode explores the position paper Robots Need More Than VLAs & World Models and its claim that the main bottleneck in robotics may be grounding: turning raw physical behavior into robot-usable signals such as actions, contacts, task phases, goals, and rewards. It explains why vision-language-action models, world models, and reward models play different roles, and why simply scaling policy transformers cannot recover supervision that was never captured in the data. The discussion also digs into cross-embodiment learning and task-preserving retargeting, focusing on how humans and different robots can share useful experience despite mismatched bodies, sensors, and action spaces. A standout example is EgoMimic, which uses egocentric human video, 3D hand tracking, cross-domain alignment, and joint human-robot training to improve long-horizon real-robot manipulation, giving listeners a concrete picture of what might actually unlock broader robot generalization. Sources: 1. Robots Need More than VLA and World Models — Elis Karcini, Faisal Mehrban, Quang Nguyen, Mac Schwager, Arash Ajoudani, Cesar Cadena, Jan Peters, Marco Hutter, Haitham Bou-Ammar, 2026 http://arxiv.org/abs/2606.06556 2. RT-1: Robotics Transformer for Real-World Control at Scale — Anthony Brohan, Noah Brown, Chelsea Finn, Sergey Levine, et al., 2022 https://scholar.google.com/scholar?q=RT-1:+Robotics+Transformer+for+Real-World+Control+at+Scale 3. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control — Anthony Brohan, Noah Brown, Danny Driess, Karol Hausman, Chelsea Finn, Sergey Levine, et al., 2023 https://scholar.google.com/scholar?q=RT-2:+Vision-Language-Action+Models+Transfer+Web+Knowledge+to+Robotic+Control 4. OpenVLA: An Open-Source Vision-Language-Action Model — Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Chelsea Finn, Sergey Levine, et al., 2024 https://scholar.google.com/scholar?q=OpenVLA:+An+Open-Source+Vision-Language-Action+Model 5. π0: A Vision-Language-Action Flow Model for General Robot Control — Kevin Black, Noah Brown, Danny Driess, Karol Hausman, Sergey Levine, Chelsea Finn, et al., 2024 https://scholar.google.com/scholar?q=π0:+A+Vision-Language-Action+Flow+Model+for+General+Robot+Control 6. Deep reinforcement learning from human preferences — Paul Christiano, Jan Leike, Tom B. Brown, Dario Amodei, et al., 2017 https://scholar.google.com/scholar?q=Deep+reinforcement+learning+from+human+preferences 7. Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos — Annie S. Chen, Suraj Nair, Chelsea Finn, 2021 https://scholar.google.com/scholar?q=Learning+Generalizable+Robotic+Reward+Functions+from+"In-The-Wild"+Human+Videos 8. Language to Rewards for Robotic Skill Synthesis — Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Brian Ichter, Ted Xiao, Fei Xia, et al., 2023 https://scholar.google.com/scholar?q=Language+to+Rewards+for+Robotic+Skill+Synthesis 9. RoboReward: General-Purpose Vision-Language Reward Models for Robotics — Tony Lee, Andrew Wagenmaker, Karl Pertsch, Percy Liang, Sergey Levine, Chelsea Finn, 2026 https://scholar.google.com/scholar?q=RoboReward:+General-Purpose+Vision-Language+Reward+Models+for+Robotics 10. Open X-Embodiment: Robotic Learning Datasets and RT-X Models — Open X-Embodiment Collaboration; Abby O'Neill, Fei Xia, Chelsea Finn, Sergey Levine, et al., 2023 https://scholar.google.com/scholar?q=Open+X-Embodiment:+Robotic+Learning+Datasets+and+RT-X+Models 11. XSkill: Cross Embodiment Skill Discovery — Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song, 2023 https://scholar.google.com/scholar?q=XSkill:+Cross+Embodiment+Skill+Discovery 12. Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment — Tianyu Wang, Dwait Bhatt, Xiaolong Wang, Nikolay Atanasov, 2024 https://scholar.google.com/scholar?q=Cross-Embodiment+Robot+Manipulation+Skill+Transfer+using+Latent+Space+Alignment 13. Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer — Gemini Robotics Team; Abbas Abdolmaleki, Anthony Brohan, Keerthana Gopalakrishnan, Ted Xiao, et al., 2025 https://scholar.google.com/scholar?q=Gemini+Robotics+1.5:+Pushing+the+Frontier+of+Generalist+Robots+with+Advanced+Embodied+Reasoning,+Thinking,+and+Motion+Transfer 14. Learning Latent Plans from Play — Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Sergey Levine, Pierre Sermanet, et al., 2019 https://scholar.google.com/scholar?q=Learning+Latent+Plans+from+Play 15. R3M: A Universal Visual Representation for Robot Manipulation — Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta, 2022 https://scholar.google.com/scholar?q=R3M:+A+Universal+Visual+Representation+for+Robot+Manipulation 16. Zero-Shot Robot Manipulation from Passive Human Videos — Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar, 2023 https://scholar.google.com/scholar?q=Zero-Shot+Robot+Manipulation+from+Passive+Human+Videos 17. GenSim: Generating Robotic Simulation Tasks via Large Language Models — Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Huazhe Xu, Xiaolong Wang, et al., 2023 https://scholar.google.com/scholar?q=GenSim:+Generating+Robotic+Simulation+Tasks+via+Large+Language+Models 18. $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control — Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, et al., 2024 https://scholar.google.com/scholar?q=$\pi_0$:+A+Vision-Language-Action+Flow+Model+for+General+Robot+Control 19. EgoMimic: Scaling Imitation Learning via Egocentric Video — Simar Kareer, Dhruv Patel, Ryan Punamiya, Pranay Mathur, Shuo Cheng, Chen Wang, Judy Hoffman, Danfei Xu, 2024 https://scholar.google.com/scholar?q=EgoMimic:+Scaling+Imitation+Learning+via+Egocentric+Video 20. LEGATO: Cross-Embodiment Imitation Using a Grasping Tool — Mingyo Seo, H. Andy Park, Shenli Yuan, Yuke Zhu, Luis Sentis, 2024 https://scholar.google.com/scholar?q=LEGATO:+Cross-Embodiment+Imitation+Using+a+Grasping+Tool 21. Rank2Reward: Learning Shaped Reward Functions from Passive Video — Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta, 2024 https://scholar.google.com/scholar?q=Rank2Reward:+Learning+Shaped+Reward+Functions+from+Passive+Video 22. Genie: Generative Interactive Environments — Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, et al., 2024 https://scholar.google.com/scholar?q=Genie:+Generative+Interactive+Environments 23. Neural Scaling Laws in Robotics — Sebastian Sartor, Neil Thompson, 2024 https://arxiv.org/abs/2405.14005 24. Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression — Junjie Wen et al., 2024 https://arxiv.org/abs/2412.03293 25. Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation — Ria Doshi et al., 2024 https://arxiv.org/abs/2408.11812 26. RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning — Lawrence Yunliang Chen et al., 2024 https://arxiv.org/abs/2409.03403 27. CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations — Anthony Liang et al., 2025 https://arxiv.org/abs/2505.04999 28. DayDreamer: World Models for Physical Robot Learning — Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel, 2022 https://arxiv.org/abs/2206.14176 29. Ctrl-World: A Controllable Generative World Model for Robot Manipulation — Yanjiang Guo et al., 2025 https://arxiv.org/abs/2510.10125 30. AI Post Transformers: DreamerV3 World Models Across 150 Tasks — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-dreamerv3-world-models-across-150-tasks-af5edb.mp3 31. AI Post Transformers: When LLM Judges Become Coin Flips — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-when-llm-judges-become-coin-flips-8b43ef.mp3 32. AI Post Transformers: SkillsBench for Evaluating Agent Skills — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-skillsbench-for-evaluating-agent-skills-58bb1e.mp3 Interactive Visualization: Robots Need More Than VLAs and World Models
2d ago

End-to-End Context Compression at Scale

This episode explores End-to-End Context Compression at Scale, a paper on whether learned context compression can beat the cost of long-context inference in quality, time to first token, and peak memory. It explains the main design choices behind the authors’ Latent Context Language Models, which use a 0.6B encoder and 4B decoder to replace long token sequences with learned latent memory at compression ratios from 1:4 to 1:16, and contrasts that approach with full-context prompting, retrieval, summarization, and KV-cache compression methods such as SnapKV and KVzip. The discussion highlights the paper’s core result: on RULER and LongBench EN-16, the released system reportedly sets a new Pareto frontier, delivering up to 8.8x faster inference on RULER and 5.2x faster on LongBench with lower memory use and stronger accuracy at aggressive compression. It also digs into the catch that makes the result interesting for practitioners: this speedup depends on a heavily trained system and changes the serving stack, so the real question is not just whether the benchmark wins are real, but whether learned compression is finally practical infrastructure for long-horizon agents and large-scale deployment. Sources: 1. End-to-End Context Compression at Scale — Ang Li, Sean McLeish, Haozhe Chen, Nimit Kalra, Zaiqian Chen, Artem Gazizov, Venkata Anoop Suhas Kumar Morisetty, Bhavya Kailkhura, Harshitha Menon, Zhuang Liu, Brian R. Bartoldson, Tom Goldstein, Sanae Lotfi, Micah Goldblum, Pavel Izmailov, 2026 http://arxiv.org/abs/2606.09659 2. Learning to Compress Prompts with Gist Tokens — Jesse Mu, Xiang Lisa Li, Noah Goodman, 2023 https://arxiv.org/abs/2304.08467 3. Adapting Language Models to Compress Contexts — Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen, 2023 https://arxiv.org/abs/2305.14788 4. Long-Context Language Modeling with Parallel Context Encoding — Howard Yen, Tianyu Gao, Danqi Chen, 2024 https://arxiv.org/abs/2402.16617 5. ARC-Encoder: learning compressed text representations for large language models — Hippolyte Pilchen, Edouard Grave, Patrick Perez, 2025 https://arxiv.org/abs/2510.20535 6. SnapKV: LLM Knows What You are Looking for Before Generation — Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen, 2024 https://scholar.google.com/scholar?q=SnapKV:+LLM+Knows+What+You+are+Looking+for+Before+Generation 7. KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction — Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song, 2025 https://scholar.google.com/scholar?q=KVzip:+Query-Agnostic+KV+Cache+Compression+with+Context+Reconstruction 8. Fast KV Compaction via Attention Matching — Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim, 2026 https://scholar.google.com/scholar?q=Fast+KV+Compaction+via+Attention+Matching 9. Cartridges: Lightweight and general-purpose long context representations via self-study — Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Re, 2025 https://scholar.google.com/scholar?q=Cartridges:+Lightweight+and+general-purpose+long+context+representations+via+self-study 10. Latent Context Compilation: Distilling Long Context into Compact Portable Memory — Zeju Li, Yizhou Zhou, Qiang Xu, 2026 https://scholar.google.com/scholar?q=Latent+Context+Compilation:+Distilling+Long+Context+into+Compact+Portable+Memory 11. RULER: What's the Real Context Size of Your Long-Context Language Models? — Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg, 2024 https://scholar.google.com/scholar?q=RULER:+What's+the+Real+Context+Size+of+Your+Long-Context+Language+Models? 12. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks — Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li, 2024 https://scholar.google.com/scholar?q=LongBench+v2:+Towards+Deeper+Understanding+and+Reasoning+on+Realistic+Long-context+Multitasks 13. ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse — Yu Zhu et al., 2026 https://arxiv.org/abs/2605.22850 14. PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference — Krishna Teja Chitty-Venkata et al., 2025 https://arxiv.org/abs/2509.04377 15. ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference — Xiang Liu et al., 2025 https://arxiv.org/abs/2502.00299 16. Long Context Compression with Activation Beacon — Peitian Zhang et al., 2024 https://arxiv.org/abs/2401.03462 17. AI Post Transformers: Explicit Information Transmission for Context Compression — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-explicit-information-transmission-for-co-24e3c2.mp3 18. AI Post Transformers: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-turboquant-online-vector-quantiz-1967b7.mp3 19. AI Post Transformers: DeepSeek-V4 and Practical Million-Token Context — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-deepseek-v4-and-practical-million-token-6f4de1.mp3 20. AI Post Transformers: Memory Sparse Attention for 100M-Token Scaling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-07-memory-sparse-attention-for-100m-token-s-377cff.mp3 21. AI Post Transformers: Long Context Pre-Training with Lighthouse Attention — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-long-context-pre-training-with-lighthous-e85bbe.mp3 22. AI Post Transformers: Compressed Convolutional Attention in Latent Space — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-compressed-convolutional-attention-in-la-61e1cf.mp3
2d ago

KumoRFM-2 for Relational Learning at Scale

This episode explores KumoRFM-2, a relational foundation model designed to learn directly from connected database tables instead of flattening customers, orders, products, and tickets into a single feature table. It explains why relational learning matters for enterprise tasks such as churn, fraud, and demand prediction, arguing that flattening often erases multi-hop relationships, repeated interactions, and temporal patterns that carry the real signal. The discussion centers on KumoRFM-2’s main technical claim: a two-stage, task-conditioned attention pipeline that first selects relevant information within each table and then aggregates evidence across foreign-key neighborhoods and labeled in-context examples derived from predictive queries. Listeners would find it interesting because it connects a very practical data-engineering pain point to a broader question about whether pretrained, database-native models can beat hand-built tabular pipelines without cheating on time-aware prediction. Sources: 1. KumoRFM-2: Scaling Foundation Models for Relational Learning — Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, Matthias Fey, 2026 http://arxiv.org/abs/2604.12596 2. Learning Probabilistic Relational Models — Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer, 1999 https://scholar.google.com/scholar?q=Learning+Probabilistic+Relational+Models 3. Relational Inductive Biases, Deep Learning, and Graph Networks — Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Yujia Li, Razvan Pascanu, et al., 2018 https://scholar.google.com/scholar?q=Relational+Inductive+Biases,+Deep+Learning,+and+Graph+Networks 4. Relational Deep Learning: Graph Representation Learning on Relational Databases — Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, Jure Leskovec, 2023 https://scholar.google.com/scholar?q=Relational+Deep+Learning:+Graph+Representation+Learning+on+Relational+Databases 5. RelBench: A Benchmark for Deep Learning on Relational Databases — Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Jure Leskovec, et al., 2024 https://scholar.google.com/scholar?q=RelBench:+A+Benchmark+for+Deep+Learning+on+Relational+Databases 6. Position: Why Tabular Foundation Models Should Be a Research Priority — Boris Van Breugel, Mihaela Van Der Schaar, 2024 https://scholar.google.com/scholar?q=Position:+Why+Tabular+Foundation+Models+Should+Be+a+Research+Priority 7. Accurate Predictions on Small Data with a Tabular Foundation Model — Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter, et al., 2025 https://scholar.google.com/scholar?q=Accurate+Predictions+on+Small+Data+with+a+Tabular+Foundation+Model 8. TabICL: A Tabular Foundation Model for In-Context Learning on Large Data — Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan, 2025 https://scholar.google.com/scholar?q=TabICL:+A+Tabular+Foundation+Model+for+In-Context+Learning+on+Large+Data 9. KumoRFM: A Foundation Model for In-Context Learning on Relational Data — Matthias Fey, Vid Kocijan, Federico Lopez, Jan Eric Lenssen, Jure Leskovec, 2025 https://scholar.google.com/scholar?q=KumoRFM:+A+Foundation+Model+for+In-Context+Learning+on+Relational+Data 10. PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models — Vignesh Kothapalli et al., 2026 https://scholar.google.com/scholar?q=PluRel:+Synthetic+Data+unlocks+Scaling+Laws+for+Relational+Foundation+Models 11. Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data — Rishabh Ranjan et al., 2026 https://scholar.google.com/scholar?q=Relational+Transformer:+Toward+Zero-Shot+Foundation+Models+for+Relational+Data 12. No Need to Train Your RDB Foundation Model — Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, and David Wipf, 2026 https://scholar.google.com/scholar?q=No+Need+to+Train+Your+RDB+Foundation+Model 13. TabICLv2: A better, faster, scalable, and open tabular foundation model — Jingang Qu, David Holzmuller, Gael Varoquaux, and Marine Le Morvan, 2026 https://scholar.google.com/scholar?q=TabICLv2:+A+better,+faster,+scalable,+and+open+tabular+foundation+model 14. Griffin: Towards a Graph-Centric Relational Database Foundation Model — Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang, 2025 https://scholar.google.com/scholar?q=Griffin:+Towards+a+Graph-Centric+Relational+Database+Foundation+Model 15. Graph Machine Learning Meets Multi-Table Relational Data — Quan Gan, Minjie Wang, David Wipf, Christos Faloutsos, 2024 https://scholar.google.com/scholar?q=Graph+Machine+Learning+Meets+Multi-Table+Relational+Data 16. Large Scale Transfer Learning for Tabular Data via Language Modeling — Josh Gardner, Juan C. Perdomo, Ludwig Schmidt, 2024 https://scholar.google.com/scholar?q=Large+Scale+Transfer+Learning+for+Tabular+Data+via+Language+Modeling 17. Towards Synthetic Data for Fine-tuning Tabular Foundation Models — Magnus Buhler, Lennart Purucker, Frank Hutter, 2025 https://scholar.google.com/scholar?q=Towards+Synthetic+Data+for+Fine-tuning+Tabular+Foundation+Models 18. Range-limited Augmentation for Few-shot Learning in Tabular Data with Comprehensive Benchmark — Kyungeun Lee et al., 2025 https://scholar.google.com/scholar?q=Range-limited+Augmentation+for+Few-shot+Learning+in+Tabular+Data+with+Comprehensive+Benchmark 19. AI Post Transformers: Muon Is Scalable for LLM Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-muon-is-scalable-for-llm-training-587ed8.mp3 20. AI Post Transformers: Scaling Laws for Multilingual Code Pretraining — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-scaling-laws-for-multilingual-code-pretr-7d220e.mp3
2d ago

Latent Reasoning with Normalizing Flows

This episode explores Latent Reasoning with Normalizing Flows, a paper that asks whether a standard left-to-right transformer can do its intermediate reasoning in continuous latent states instead of spelling every step out as text. It explains how the method uses a frozen VAE during training to compress written rationales into short latent sequences, then uses shallow normalizing flows so the same autoregressive backbone can predict both latent thought slots and normal answer tokens while preserving exact likelihoods, sampling, and KV-cache-friendly decoding. The discussion highlights matched coding results on Qwen3-8B-Base, where the reported benchmark average rises from 55.8 for the base model to 68.8 for NF-CoT Unified and 70.1 after latent-space reinforcement learning, with strong pass@k gains that suggest better exploration of multiple solution paths. Listeners would find it interesting because it frames latent reasoning as a practical alternative to verbose chain-of-thought, while also noting the current evidence is still narrow, centered on one post-trained coding model and not uniformly better than diffusion baselines on every benchmark. Sources: 1. Latent Reasoning with Normalizing Flows — Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu, 2026 http://arxiv.org/abs/2606.06447 2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Jason Wei, Xuezhi Wang, Denny Zhou, Quoc Le, et al., 2022 https://scholar.google.com/scholar?q=Chain-of-Thought+Prompting+Elicits+Reasoning+in+Large+Language+Models 3. Training Large Language Models to Reason in a Continuous Latent Space — Shibo Hao, Sainbayar Sukhbaatar, Zhiting Hu, Jason Weston, Yuandong Tian, et al., 2024 preprint; COLM 2025 https://scholar.google.com/scholar?q=Training+Large+Language+Models+to+Reason+in+a+Continuous+Latent+Space 4. Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning — Xinghao Chen, Anhao Zhao, Xiaoyu Shen, et al., 2025 https://scholar.google.com/scholar?q=Reasoning+Beyond+Language:+A+Comprehensive+Survey+on+Latent+Chain-of-Thought+Reasoning 5. LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning — Haoqiang Kang, Yizhe Zhang, Navdeep Jaitly, Yi-An Ma, Lianhui Qin, et al., 2025 https://scholar.google.com/scholar?q=LaDiR:+Latent+Diffusion+Enhances+LLMs+for+Text+Reasoning 6. CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation — Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 2025 https://scholar.google.com/scholar?q=CODI:+Compressing+Chain-of-Thought+into+Continuous+Space+via+Self-Distillation 7. Normalizing Flows are Capable Generative Models — Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind, 2024 https://scholar.google.com/scholar?q=Normalizing+Flows+are+Capable+Generative+Models 8. Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought — Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian, 2025 https://scholar.google.com/scholar?q=Reasoning+by+Superposition:+A+Theoretical+Perspective+on+Chain+of+Continuous+Thought 9. Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure — Zirui Li, Xuefeng Bai, Kehai Chen, Yizhi Li, Jian Yang, Chenghua Lin, Min Zhang, 2026 https://scholar.google.com/scholar?q=Dynamics+Within+Latent+Chain-of-Thought:+An+Empirical+Study+of+Causal+Structure 10. Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning — Jingcheng Deng, Zihao Wei, Liang Pang, Junhong Wu, Shicheng Xu, Zenghao Duan, Huawei Shen, 2026 https://scholar.google.com/scholar?q=Latent-GRPO:+Group+Relative+Policy+Optimization+for+Latent+Reasoning 11. Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision — Dawei Zhu et al., 2025 https://arxiv.org/abs/2502.20790 12. Supervised Chain of Thought — Xiang Zhang and Dujian Ding, 2024 https://arxiv.org/abs/2410.14198 13. Large language models can learn and generalize steganographic chain-of-thought under process supervision — Joey Skaf et al., 2025 https://arxiv.org/abs/2506.01926 14. Hybrid Latent Reasoning via Reinforcement Learning — Zhenrui Yue et al., 2025 https://arxiv.org/abs/2505.18454 15. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach — Jonas Geiping et al., 2025 https://arxiv.org/abs/2502.05171 16. R-KV: Redundancy-aware KV Cache Compression for Reasoning Models — Zefan Cai et al., 2025 https://arxiv.org/abs/2505.24133 17. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning — Yu Fu et al., 2024 https://arxiv.org/abs/2410.19258 18. AI Post Transformers: Generative Recursive Reasoning in Latent Space — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-21-generative-recursive-reasoning-in-latent-a9371d.mp3 19. AI Post Transformers: MELT: Decoupling Compute From Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-melt-decoupling-compute-from-memory-26430c.mp3 20. AI Post Transformers: Reasoning Theater and Unfaithful Chain-of-Thought — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-reasoning-theater-and-unfaithful-chain-o-a4507e.mp3 21. AI Post Transformers: Gradient Descent at Inference Time for LLM Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-10-gradient-descent-at-inference-time-for-l-20617d.mp3 22. AI Post Transformers: Explicit Information Transmission for Context Compression — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-explicit-information-transmission-for-co-24e3c2.mp3 23. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 Interactive Visualization: Latent Reasoning with Normalizing Flows

See All (688)

3.7

out of 5

3 Ratings

Creator

mcgrof
Years Active

2025 - 2026
Episodes

688
Rating

Clean
Show Website

AI Post Transformers

Technology

Technology

Updated Semiweekly

AI Post Transformers

Atlas: Test-Time Memory for Long Contexts

Do Transformers Need Three Projections?

KumoRFM for In-Context Relational Learning

Learning at Test Time with Expressive RNN States

Robots Need More Than VLAs and World Models

End-to-End Context Compression at Scale

KumoRFM-2 for Relational Learning at Scale

Latent Reasoning with Normalizing Flows

Ratings & Reviews

About

Information

You Might Also Like

AI Post Transformers

Episodes

Atlas: Test-Time Memory for Long Contexts

Do Transformers Need Three Projections?

KumoRFM for In-Context Relational Learning

Learning at Test Time with Expressive RNN States

Robots Need More Than VLAs and World Models

End-to-End Context Compression at Scale

KumoRFM-2 for Relational Learning at Scale

Latent Reasoning with Normalizing Flows

Ratings & Reviews

About

Information

You Might Also Like