AI Post Transformers

mcgrof

3,7 (3)
Technologies
Tous les jours

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

-1 j

SiLU Activations for Replay-Free Atari Reinforcement Learning

This episode explores a 2017 paper arguing that sigmoid-weighted activation functions, specifically SiLU and dSiLU, can materially improve deep reinforcement learning when paired with replay-free Sarsa(lambda), eligibility traces, and softmax exploration. It explains why activation choice matters more in bootstrapped value learning than in ordinary supervised settings, and uses that as a lens to unpack older RL concepts like function approximation, TD(lambda), and on-policy learning for listeners coming from modern deep learning. The discussion walks through the paper’s results on SZ-Tetris, 10x10 Tetris, and Atari-style settings, highlighting that dSiLU and mixed SiLU/dSiLU networks outperformed ReLU-based alternatives in several configurations. Listeners would find it interesting because it challenges the idea that replay buffers and DQN-style machinery are the only serious path for high-dimensional RL, and shows how a seemingly small architectural choice can reshape learning dynamics. Sources: 1. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning — Stefan Elfwing, Eiji Uchibe, Kenji Doya, 2017 http://arxiv.org/abs/1702.03118 2. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units — Dan Hendrycks, Kevin Gimpel, 2016 https://scholar.google.com/scholar?q=Bridging+Nonlinearities+and+Stochastic+Regularizers+with+Gaussian+Error+Linear+Units 3. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning — Stefan Elfwing, Eiji Uchibe, Kenji Doya, 2017 https://scholar.google.com/scholar?q=Sigmoid-Weighted+Linear+Units+for+Neural+Network+Function+Approximation+in+Reinforcement+Learning 4. Searching for Activation Functions — Prajit Ramachandran, Barret Zoph, Quoc V. Le, 2017 https://scholar.google.com/scholar?q=Searching+for+Activation+Functions 5. GLU Variants Improve Transformer — Noam Shazeer, 2020 https://scholar.google.com/scholar?q=GLU+Variants+Improve+Transformer 6. Learning to Predict by the Methods of Temporal Differences — Richard S. Sutton, 1988 https://scholar.google.com/scholar?q=Learning+to+Predict+by+the+Methods+of+Temporal+Differences 7. True Online Temporal-Difference Learning — Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton, 2015 https://scholar.google.com/scholar?q=True+Online+Temporal-Difference+Learning 8. High-Dimensional Continuous Control Using Generalized Advantage Estimation — John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, 2015 https://scholar.google.com/scholar?q=High-Dimensional+Continuous+Control+Using+Generalized+Advantage+Estimation 9. Multi-step Reinforcement Learning: A Unifying Algorithm — Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton, 2017 https://scholar.google.com/scholar?q=Multi-step+Reinforcement+Learning:+A+Unifying+Algorithm 10. Playing Atari with Deep Reinforcement Learning — Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, 2013 https://scholar.google.com/scholar?q=Playing+Atari+with+Deep+Reinforcement+Learning 11. Asynchronous Methods for Deep Reinforcement Learning — Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016 https://scholar.google.com/scholar?q=Asynchronous+Methods+for+Deep+Reinforcement+Learning 12. Proximal Policy Optimization Algorithms — John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017 https://scholar.google.com/scholar?q=Proximal+Policy+Optimization+Algorithms 13. Training Language Models to Follow Instructions with Human Feedback — Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022 https://scholar.google.com/scholar?q=Training+Language+Models+to+Follow+Instructions+with+Human+Feedback 14. Human-level control through deep reinforcement learning — Volodymyr Mnih et al., 2015 https://scholar.google.com/scholar?q=Human-level+control+through+deep+reinforcement+learning 15. Deep reinforcement learning with double q-learning — Hado van Hasselt, Arthur Guez, David Silver, 2015 https://scholar.google.com/scholar?q=Deep+reinforcement+learning+with+double+q-learning 16. Prioritized Experience Replay — Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, 2016 https://scholar.google.com/scholar?q=Prioritized+Experience+Replay 17. High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris — Wojciech Jaskowski, Maciej Szubert, Pawel Liskowski, Krzysztof Krawiec, 2015 https://scholar.google.com/scholar?q=High-dimensional+function+approximation+for+knowledge-free+reinforcement+learning:+a+case+study+in+SZ-Tetris 18. Approximate dynamic programming finally performs well in the game of tetris — Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer, 2013 https://scholar.google.com/scholar?q=Approximate+dynamic+programming+finally+performs+well+in+the+game+of+tetris 19. Replay across Experiments: A Natural Extension of Off-Policy RL — Dhruva Tirumala et al., 2023 https://arxiv.org/abs/2311.15951 20. Adaptive Q-Network: On-the-fly Target Selection for Deep Reinforcement Learning — Theo Vincent et al., 2024 https://arxiv.org/abs/2405.16195 21. A Survey of Temporal Credit Assignment in Deep Reinforcement Learning — Eduardo Pignatelli et al., 2023 https://arxiv.org/abs/2312.01072 22. AI Post Transformers: ASI-Evolve for Data, Architectures, and RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-05-asi-evolve-for-data-architectures-and-rl-197b2b.mp3 23. AI Post Transformers: LeWorldModel: Stable Joint-Embedding World Models from Pixels — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-25-leworldmodel-stable-joint-embedding-worl-650f9f.mp3
-1 j

Swish: Self-Gated Activation Beyond ReLU

This episode explores the 2017 Swish paper and asks whether a simple self-gated activation, `x * sigmoid(x)`, can outperform ReLU without changing the surrounding network architecture. It explains why activation functions matter for gradient flow and deep optimization, focusing on Swish’s smooth, non-monotonic behavior and its ability to attenuate rather than discard negative inputs. The discussion walks through results on CIFAR, ImageNet, and machine translation, highlighting modest but real gains in deeper vision models, including roughly 0.9-point and 0.6-point improvements on ImageNet benchmarks. It also gives a critical read of the evidence, noting that Swish is not a universal win and raises practical questions around tuning, sparsity, hardware efficiency, compression, and whether its legacy matters more as part of broader gating mechanisms than as a standalone ReLU replacement. Sources: 1. Swish: a Self-Gated Activation Function — Prajit Ramachandran, Barret Zoph, Quoc V. Le, 2017 http://arxiv.org/abs/1710.05941v1 2. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification — Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015 https://scholar.google.com/scholar?q=Delving+Deep+into+Rectifiers:+Surpassing+Human-Level+Performance+on+ImageNet+Classification 3. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) — Djork-Arne Clevert, Thomas Unterthiner, Sepp Hochreiter, 2015 https://scholar.google.com/scholar?q=Fast+and+Accurate+Deep+Network+Learning+by+Exponential+Linear+Units+(ELUs) 4. Gaussian Error Linear Units (GELUs) — Dan Hendrycks, Kevin Gimpel, 2016 https://scholar.google.com/scholar?q=Gaussian+Error+Linear+Units+(GELUs) 5. Searching for Activation Functions — Prajit Ramachandran, Barret Zoph, Quoc V. Le, 2017 https://scholar.google.com/scholar?q=Searching+for+Activation+Functions 6. Language Modeling with Gated Convolutional Networks — Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier, 2016 https://scholar.google.com/scholar?q=Language+Modeling+with+Gated+Convolutional+Networks 7. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning — Stefan Elfwing, Eiji Uchibe, Kenji Doya, 2017 https://scholar.google.com/scholar?q=Sigmoid-Weighted+Linear+Units+for+Neural+Network+Function+Approximation+in+Reinforcement+Learning 8. GLU Variants Improve Transformer — Noam Shazeer, 2020 https://scholar.google.com/scholar?q=GLU+Variants+Improve+Transformer 9. Attention Is All You Need — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 https://scholar.google.com/scholar?q=Attention+Is+All+You+Need 10. ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models — Iman Mirzadeh et al., 2023 https://arxiv.org/abs/2310.04564 11. ReLU^2 Wins: Discovering Efficient Activation Functions for Sparse LLMs — Zhengyan Zhang et al., 2024 https://arxiv.org/abs/2402.03804 12. Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts — Huy Nguyen, Nhat Ho, Alessandro Rinaldo, 2024 https://arxiv.org/abs/2405.13997 13. Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free — Zihan Qiu et al., 2025 https://arxiv.org/abs/2505.06708 14. A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU — Andrea Belano et al., 2024 https://arxiv.org/abs/2412.06321 15. AI Post Transformers: Adam: A Method for Stochastic Optimization — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/adam-a-method-for-stochastic-optimization/ 16. AI Post Transformers: PALOMA: Benchmarking Language Model Fit Across Domains — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-23-paloma-benchmarking-language-model-fit-a-360060.mp3
-1 j

Verbalizable Representations and the Global Workspace

This episode explores Anthropic’s paper on whether language models contain a privileged “verbalizable” subspace, functionally similar to a global workspace, whose contents can be reported, reasoned over, and deliberately controlled. It draws a clear line between access consciousness and phenomenal consciousness, then focuses on the paper’s mechanistic proposal: a Jacobian-based “J-space” that identifies internal directions causally poised to become language rather than merely easy to decode. The discussion highlights intervention results showing that swapping or ablating directions such as France/China or Soccer/Rugby changes later reports and multi-step reasoning, with broader examples in code bug detection, prompt-injection recognition, and protein-function judgments. Listeners would find it interesting because it turns a consciousness-adjacent question into a concrete engineering argument about whether models have a small, reusable internal workspace that shapes what they know, say, and do. Sources: 1. Verbalizable Representations and the Global Workspace https://transformer-circuits.pub/2026/workspace/index.html 2. Verbalizable Representations Form a Global Workspace in Language Models — Wes Gurnee, Nicholas Sofroniew, Jack Lindsey, Adam Pearce, Mateusz Piotrowski, et al., 2026 https://scholar.google.com/scholar?q=Verbalizable+Representations+Form+a+Global+Workspace+in+Language+Models 3. Eliciting Latent Predictions from Transformers with the Tuned Lens — Nora Belrose, Igor Ostrovsky, Lev McKinney, Zach Furman, Logan Smith, Danny Halawi, Stella Biderman, Jacob Steinhardt, 2023 https://scholar.google.com/scholar?q=Eliciting+Latent+Predictions+from+Transformers+with+the+Tuned+Lens 4. Sparse Autoencoders Find Highly Interpretable Features in Language Models — Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey, 2023 https://scholar.google.com/scholar?q=Sparse+Autoencoders+Find+Highly+Interpretable+Features+in+Language+Models 5. Do Activation Verbalization Methods Convey Privileged Information? — Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace, 2026 https://scholar.google.com/scholar?q=Do+Activation+Verbalization+Methods+Convey+Privileged+Information? 6. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet — Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L. Turner, Callum McDougall, Monte MacDiarmid, Alex Tamkin, Esin Durmus, Tristan Hume, Francesco Mosconi, C. Daniel Freeman, Theodore R. Sumers, Edward Rees, Joshua Batson, Adam Jermyn, Shan Carter, Chris Olah, Tom Henighan, 2026 https://scholar.google.com/scholar?q=Scaling+Monosemanticity:+Extracting+Interpretable+Features+from+Claude+3+Sonnet 7. Implicit Representations of Meaning in Neural Language Models — Belinda Z. Li, Maxwell Nye, Jacob Andreas, 2021 https://scholar.google.com/scholar?q=Implicit+Representations+of+Meaning+in+Neural+Language+Models 8. On the Biology of a Large Language Model — Jack Lindsey, Wes Gurnee, Emmanuel Ameisen, et al., 2025 https://scholar.google.com/scholar?q=On+the+Biology+of+a+Large+Language+Model 9. A Neuronal Model of a Global Workspace in Effortful Cognitive Tasks — Stanislas Dehaene, Serge Kerszberg, Jean-Pierre Changeux, 1998 https://scholar.google.com/scholar?q=A+Neuronal+Model+of+a+Global+Workspace+in+Effortful+Cognitive+Tasks 10. Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding — Haolin Chen et al., 2024 https://scholar.google.com/scholar?q=Language+Models+are+Hidden+Reasoners:+Unlocking+Latent+Reasoning+Capabilities+via+Self-Rewarding 11. Efficient Post-Training Refinement of Latent Reasoning in Large Language Models — Xinyuan Wang et al., 2025 https://scholar.google.com/scholar?q=Efficient+Post-Training+Refinement+of+Latent+Reasoning+in+Large+Language+Models 12. SeLaR: Selective Latent Reasoning in Large Language Models — Renyu Fu and Guibo Luo, 2026 https://scholar.google.com/scholar?q=SeLaR:+Selective+Latent+Reasoning+in+Large+Language+Models 13. Measuring Faithfulness in Chain-of-Thought Reasoning — Tamera Lanham et al., 2023 https://scholar.google.com/scholar?q=Measuring+Faithfulness+in+Chain-of-Thought+Reasoning 14. Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps — Martin Tutek et al., 2025 https://scholar.google.com/scholar?q=Measuring+Chain+of+Thought+Faithfulness+by+Unlearning+Reasoning+Steps 15. Why Models Know But Don't Say: Chain-of-Thought Faithfulness Divergence Between Thinking Tokens and Answers in Open-Weight Reasoning Models — Richard J. Young, 2026 https://scholar.google.com/scholar?q=Why+Models+Know+But+Don't+Say:+Chain-of-Thought+Faithfulness+Divergence+Between+Thinking+Tokens+and+Answers+in+Open-Weight+Reasoning+Models 16. Steering Language Models With Activation Engineering — Alexander Matt Turner et al., 2023 https://scholar.google.com/scholar?q=Steering+Language+Models+With+Activation+Engineering 17. Improving Instruction-Following in Language Models through Activation Steering — Alessandro Stolfo et al., 2024 https://scholar.google.com/scholar?q=Improving+Instruction-Following+in+Language+Models+through+Activation+Steering 18. Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering — Marco Valentino et al., 2025 https://scholar.google.com/scholar?q=Mitigating+Content+Effects+on+Reasoning+in+Language+Models+through+Fine-Grained+Activation+Steering 19. AI Post Transformers: How Models Detect Hidden Activation Steering — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-how-models-detect-hidden-activation-stee-577f73.mp3 20. AI Post Transformers: Neural Chameleons and Evading Activation Monitors — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-neural-chameleons-and-evading-activation-bc470e.mp3 21. AI Post Transformers: Why Transformers Fail at Counting — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-why-transformers-fail-at-counting-137924.mp3 22. AI Post Transformers: RAPTOR: Stable Concept Directions From Logistic Probes — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-raptor-stable-concept-directions-from-lo-b37365.mp3 Interactive Visualization: Verbalizable Representations and the Global Workspace
-2 j

Discretizing Reward Models for RL Alignment

This episode explores the paper Discretizing Reward Models and its argument that smooth decimal reward scores can be misleading for reinforcement learning alignment, because policies learn to exploit tiny, often meaningless differences instead of genuine quality. It explains why reward models are used for fuzzy goals like helpfulness and honesty, then digs into reward hacking, equivalence classes of equally valid answers, and the distinction between a model’s ability to separate good from bad responses versus its tendency to invent rankings among ties. The discussion also covers benchmarks such as the Ties setting and the paper’s core proposal: replacing continuous scores with a small number of ordinal reward buckets built from uncertainty estimates, pairwise equivalence judgments, and hierarchical clustering. Listeners would find it interesting because it connects an abstract modeling choice to a practical alignment problem facing modern language-model training, while also examining why the field currently seems more convinced by the diagnosis than by large-scale adoption of this exact fix. Sources: 1. Discretizing Reward Models — Vijay Viswanathan, Shiqi Wang, Devamanyu Hazarika, Chirag Nagpal, Tongshuang Wu, Graham Neubig, Yuning Mao, 2026 http://arxiv.org/abs/2606.21795 2. Deep Reinforcement Learning from Human Preferences — Paul Christiano, Jan Leike, Tom B. Brown, Shane Legg, Dario Amodei, 2017 https://scholar.google.com/scholar?q=Deep+Reinforcement+Learning+from+Human+Preferences 3. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback — Stephen Casper, Xander Davies, Claudia Shi, Jeremy Scheurer, Dylan Hadfield-Menell, et al., 2023 https://scholar.google.com/scholar?q=Open+Problems+and+Fundamental+Limitations+of+Reinforcement+Learning+from+Human+Feedback 4. RewardBench 2: Advancing Reward Model Evaluation — Saumya Malik, Valentina Pyatkin, Sander Land, Nathan Lambert, Noah A. Smith, Hannaneh Hajishirzi, 2025 https://scholar.google.com/scholar?q=RewardBench+2:+Advancing+Reward+Model+Evaluation 5. Discretizing Reward Models — Vijay Viswanathan, Shiqi Wang, Devamanyu Hazarika, Chirag Nagpal, Tongshuang Wu, Graham Neubig, Yuning Mao, 2026 https://scholar.google.com/scholar?q=Discretizing+Reward+Models 6. How to Evaluate Reward Models for RLHF — Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph Gonzalez, Ion Stoica, 2024 https://scholar.google.com/scholar?q=How+to+Evaluate+Reward+Models+for+RLHF 7. What Makes a Reward Model a Good Teacher? An Optimization Perspective — Noam Razin, Zixuan Wang, Hubert Strauss, Stanley Wei, Jason D. Lee, Sanjeev Arora, 2025 https://scholar.google.com/scholar?q=What+Makes+a+Reward+Model+a+Good+Teacher?+An+Optimization+Perspective 8. The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models — Yanjun Chen, Dawei Zhu, Yirong Sun, Xinghao Chen, Wei Zhang, Xiaoyu Shen, 2024 https://scholar.google.com/scholar?q=The+Accuracy+Paradox+in+RLHF:+When+Better+Reward+Models+Don't+Yield+Better+Language+Models 9. Validating LLM-as-a-Judge Systems under Rating Indeterminacy — Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, Alexandra Chouldechova, 2025 https://scholar.google.com/scholar?q=Validating+LLM-as-a-Judge+Systems+under+Rating+Indeterminacy 10. Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback — Amirhossein Afsharrad, Ruida Zhou, Luca Viano, Sanjay Lall, Mohammad Ghavamzadeh, 2026 https://scholar.google.com/scholar?q=Beyond+Binary+Preferences:+A+Principled+Framework+for+Reward+Modeling+with+Ordinal+Feedback 11. Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts — Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang, 2024 https://scholar.google.com/scholar?q=Interpretable+Preferences+via+Multi-Objective+Reward+Modeling+and+Mixture-of-Experts 12. AI Post Transformers: Split Personality Training Reveals Latent Knowledge — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-split-personality-training-reveals-laten-c84616.mp3 13. AI Post Transformers: Robots Need More Than VLAs and World Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-10-robots-need-more-than-vlas-and-world-mod-cdab8b.mp3
-2 j

Program-as-Weights for Compiling Fuzzy Functions

This episode explores Program-as-Weights, a paper that asks whether a natural-language description of a fuzzy software task can be compiled once into a small reusable neural artifact instead of sending every request to a larger remote model. It explains the paper’s architecture in concrete terms: a 4B pseudo-compiler rewrites the task and generates example I/O pairs, a trained 4B compiler plus LoRA mapper turns that specification into adapter weights, and a frozen Qwen3-0.6B interpreter runs the task locally on new inputs. The discussion focuses on why this matters for real problems like log triage, malformed JSON repair, and intent-based reranking, highlighting the promised gains in cost, latency, privacy, offline use, and reproducibility. It also digs into the paper’s broader claim, debating whether this is truly a new programming paradigm or a sharp repackaging of PEFT and LoRA-based adaptation, which makes the episode interesting for listeners thinking about practical deployment rather than just model benchmarks. Sources: 1. Program-as-Weights: A Programming Paradigm for Fuzzy Functions — Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng, 2026 http://arxiv.org/abs/2607.02512 2. HyperNetworks — David Ha, Andrew Dai, Quoc V. Le, 2016 https://scholar.google.com/scholar?q=HyperNetworks 3. LoRA: Low-Rank Adaptation of Large Language Models — Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, et al., 2021 https://scholar.google.com/scholar?q=LoRA:+Low-Rank+Adaptation+of+Large+Language+Models 4. Learning to Compile Programs to Neural Networks — Logan Weber, Jesse Michel, Alex Renda, Michael Carbin, 2024 https://scholar.google.com/scholar?q=Learning+to+Compile+Programs+to+Neural+Networks 5. Text-to-LoRA: Instant Transformer Adaption — Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange, 2025 https://scholar.google.com/scholar?q=Text-to-LoRA:+Instant+Transformer+Adaption 6. SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass — Y. Liu, X. Wang, Y. Mao, Y. Gelberg, H. Maron, and M. Zhang, 2026 https://scholar.google.com/scholar?q=SHINE:+A+Scalable+In-Context+Hypernetwork+for+Mapping+Context+to+LoRA+in+a+Single+Pass 7. Doc-to-LoRA: Learning to Instantly Internalize Contexts — R. Charakorn, E. Cetin, S. Uesaka, and R. T. Lange, 2026 https://scholar.google.com/scholar?q=Doc-to-LoRA:+Learning+to+Instantly+Internalize+Contexts 8. Latent Context Compilation: Distilling Long Context into Compact Portable Memory — Z. Li, Y. Zhou, and Q. Xu, 2026 https://scholar.google.com/scholar?q=Latent+Context+Compilation:+Distilling+Long+Context+into+Compact+Portable+Memory 9. The Alchemist: Automated Labeling 500x Cheaper than LLM Data Annotators — T. Huang, C. Cao, V. Bhargava, and F. Sala, 2024 https://scholar.google.com/scholar?q=The+Alchemist:+Automated+Labeling+500x+Cheaper+than+LLM+Data+Annotators 10. Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models — Vladimir Araujo, Marie-Francine Moens, Tinne Tuytelaars, 2024 https://arxiv.org/abs/2408.09053 11. X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design — Eric L. Buehler, Markus J. Buehler, 2024 https://arxiv.org/abs/2402.07148 12. CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning — Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song, 2024 https://arxiv.org/abs/2407.21043 13. Gradient Projection For Continual Parameter-Efficient Tuning — Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Wensheng Zhang, Zhi Han, Yuan Xie, 2024 https://arxiv.org/abs/2405.13383 14. Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading — Minrui Xu, Dusit Niyato, Christopher G. Brinton, 2025 https://arxiv.org/abs/2501.14205 15. AI Post Transformers: SGLang for Faster Structured LLM Programs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-06-sglang-for-faster-structured-llm-program-c59f1c.mp3 16. AI Post Transformers: OpenSkill for Open-World Self-Evolution in LLM Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-08-openskill-for-open-world-self-evolution-19762a.mp3 17. AI Post Transformers: Learning Facts at Scale with Active Reading — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-25-learning-facts-at-scale-with-active-read-161bea.mp3 18. AI Post Transformers: Fine-Tuning LLMs for Human Behavior Prediction — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-21-fine-tuning-llms-for-human-behavior-pred-c79163.mp3
-3 j

AIConfigurator for Cross-Framework LLM Serving

This episode explores AIConfigurator, a NVIDIA-led system for optimizing LLM serving configurations across frameworks such as TensorRT-LLM, vLLM, SGLang, and NVIDIA’s internal stack without brute-force benchmarking. It explains why operators should care more about TTFT, TPOT, and goodput than raw tokens-per-second, and unpacks the real serving tradeoffs around prefill/decode disaggregation, hybrid tensor/pipeline/expert parallelism, CUDA graphs, KV-cache sizing, and token limits. The discussion argues that the paper’s main contribution is a calibrated, framework-agnostic performance model built from primitive costs like GEMMs, attention, communication, and memory operations, then combined with backend-specific scheduling behavior to search thousands of deployment choices quickly. It is especially interesting for listeners who want a concrete view of LLM deployment economics: how to translate hardware budgets and latency targets into practical, high-performing serving setups without wasting days of GPU tuning. Sources: 1. AIConfigurator for Cross-Framework LLM Serving https://arxiv.org/pdf/2601.06288 2. LLM Inference Serving: Survey of Recent Advances and Opportunities — Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari, 2024 https://scholar.google.com/scholar?q=LLM+Inference+Serving:+Survey+of+Recent+Advances+and+Opportunities 3. Efficient Memory Management for Large Language Model Serving with PagedAttention — Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, 2023 https://scholar.google.com/scholar?q=Efficient+Memory+Management+for+Large+Language+Model+Serving+with+PagedAttention 4. Vidur: A Large-Scale Simulation Framework For LLM Inference — Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov, 2024 https://scholar.google.com/scholar?q=Vidur:+A+Large-Scale+Simulation+Framework+For+LLM+Inference 5. AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving — Tianhao Xu, Yiming Liu, Xianglong Lu, Yijia Zhao, Xuting Zhou, Aichen Feng, Yiyi Chen, et al., 2026 https://scholar.google.com/scholar?q=AIConfigurator:+Lightning-Fast+Configuration+Optimization+for+Multi-Framework+LLM+Serving 6. Apex: An Extensible and Dynamism-aware Simulator for Automated Parallel Execution in LLM Serving — Yi-Chien Lin et al., 2024 https://scholar.google.com/scholar?q=Apex:+An+Extensible+and+Dynamism-aware+Simulator+for+Automated+Parallel+Execution+in+LLM+Serving 7. DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Yinmin Zhong et al., 2024 https://scholar.google.com/scholar?q=DistServe:+Disaggregating+Prefill+and+Decoding+for+Goodput-optimized+Large+Language+Model+Serving 8. Splitwise: Efficient Generative LLM Inference using Phase Splitting — Pratyush Patel et al., 2024 https://scholar.google.com/scholar?q=Splitwise:+Efficient+Generative+LLM+Inference+using+Phase+Splitting 9. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving — Ruoyu Qin et al., 2024 https://scholar.google.com/scholar?q=Mooncake:+A+KVCache-centric+Disaggregated+Architecture+for+LLM+Serving 10. Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve — Amey Agrawal et al., 2024 https://scholar.google.com/scholar?q=Taming+Throughput-Latency+Tradeoff+in+LLM+Inference+with+Sarathi-Serve 11. KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference — Huan Yang et al., 2025 https://arxiv.org/abs/2503.16525 12. Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management — Haoyu Zheng et al., 2026 https://arxiv.org/abs/2605.06472 13. PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch — Abhishek Ghosh et al., 2025 https://arxiv.org/abs/2503.19779 14. Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving — Chao Wang et al., 2025 https://arxiv.org/abs/2508.01989 15. Enhancing LLM Efficiency: Targeted Pruning for Prefill-Decode Disaggregation in Inference — Hao Zhang et al., 2025 https://arxiv.org/abs/2509.04467 16. Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts — Xuan-Phi Nguyen et al., 2026 https://arxiv.org/abs/2601.17111 17. AI Post Transformers: LLMServingSim 2.0 for Disaggregated LLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-27-llmservingsim-20-for-disaggregated-llm-s-05c04b.mp3 18. AI Post Transformers: LAPS for Length-Aware LLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-laps-for-length-aware-llm-serving-0c6149.mp3 19. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 20. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3
-3 j

Nemotron-TwoTower for Parallel Diffusion Language Modeling

This episode explores Nemotron-TwoTower, an NVIDIA paper that tries to keep the text quality of autoregressive language models while reducing the one-token-at-a-time decoding bottleneck through block-wise diffusion generation. It explains how diffusion language modeling works in practice: future token blocks begin as noisy or masked guesses and are iteratively refined in parallel, rather than emitted strictly one token at a time. The discussion focuses on the paper’s core architectural idea of splitting responsibilities between a frozen pretrained causal context tower and a separate trainable denoiser tower, including layer-aligned cross-attention, reused KV caches and Mamba states, and confidence-based early token commitment. Listeners would find it interesting because it gets beyond benchmark hype and examines the real systems tradeoff the paper is making: higher throughput through heavier refinement steps, balanced against serving complexity, multiple denoising passes, and the risk of losing autoregressive-level reliability. Sources: 1. Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context — Fitsum Reda, John Kamalu, Roger Waleffe, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, 2026 http://arxiv.org/abs/2606.26493 2. Structured Denoising Diffusion Models in Discrete State-Spaces (https://arxiv.org/abs/2107.03006) — Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rianne van den Berg, 2021 https://scholar.google.com/scholar?q=Structured+Denoising+Diffusion+Models+in+Discrete+State-Spaces+(https://arxiv.org/abs/2107.03006) 3. Simple and Effective Masked Diffusion Language Models (https://arxiv.org/abs/2406.07524) — Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Volodymyr Kuleshov, 2024 https://scholar.google.com/scholar?q=Simple+and+Effective+Masked+Diffusion+Language+Models+(https://arxiv.org/abs/2406.07524) 4. Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (https://arxiv.org/abs/2503.09573) — Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Volodymyr Kuleshov, 2025 https://scholar.google.com/scholar?q=Block+Diffusion:+Interpolating+Between+Autoregressive+and+Diffusion+Language+Models+(https://arxiv.org/abs/2503.09573) 5. Encoder-Decoder Diffusion Language Models for Efficient Training and Inference (https://arxiv.org/abs/2510.22852) — Marianne Arriola, Yair Schiff, Hao Phung, Aaron Gokaslan, Volodymyr Kuleshov, 2025 https://scholar.google.com/scholar?q=Encoder-Decoder+Diffusion+Language+Models+for+Efficient+Training+and+Inference+(https://arxiv.org/abs/2510.22852) 6. Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models — Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov, 2025 https://scholar.google.com/scholar?q=Block+Diffusion:+Interpolating+Between+Autoregressive+and+Diffusion+Language+Models 7. Encoder-Decoder Diffusion Language Models for Efficient Training and Inference — Marianne Arriola, Yair Schiff, Hao Phung, Aaron Gokaslan, Volodymyr Kuleshov, 2025 https://scholar.google.com/scholar?q=Encoder-Decoder+Diffusion+Language+Models+for+Efficient+Training+and+Inference 8. Simple and Effective Masked Diffusion Language Models — Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T. Chiu, Alexander Rush, Volodymyr Kuleshov, 2024 https://scholar.google.com/scholar?q=Simple+and+Effective+Masked+Diffusion+Language+Models 9. Large Language Diffusion Models — Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li, 2025 https://scholar.google.com/scholar?q=Large+Language+Diffusion+Models 10. Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data — Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li, 2024 https://scholar.google.com/scholar?q=Your+Absorbing+Discrete+Diffusion+Secretly+Models+the+Conditional+Distributions+of+Clean+Data 11. AI Post Transformers: NeurIPS 2025: Large Language Diffusion Models — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/neurips-2025-large-language-diffusion-models/ 12. AI Post Transformers: Nemotron 3 Super Hybrid Mamba-Transformer MoE — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-nemotron-3-super-hybrid-mamba-transforme-31ac75.mp3 13. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3 14. AI Post Transformers: Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/draft-verify-lossless-large-language-model-acceleration-via-self-speculative-dec/ 15. AI Post Transformers: JETSPEC and Parallel Tree Speculative Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-27-jetspec-and-parallel-tree-speculative-de-3d144c.mp3
-6 j

DART Speeds Up Speculative LLM Decoding

This episode explores the DART paper as a practical attempt to make speculative decoding deliver real end-to-end speedups for memory-bound LLM inference. It explains how exact draft-and-verify decoding works, why accepted chunk length only matters when the drafter is cheap enough, and how DART differs from Medusa and EAGLE by reusing target-model hidden states to predict several future tokens in parallel with a diffusion-inspired draft stage. The discussion focuses on DART’s mechanics, including multi-layer state reuse, masked future slots, N-gram-guided pruning, and a shifted-logit design that makes the first drafted token especially important because an early mistake invalidates the rest of the chunk. Listeners would find it interesting because it connects model architecture choices to real serving constraints like latency, batching, and GPU efficiency, showing where theoretical decoding gains do and do not survive in production. Sources: 1. DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference — Fuliang Liu, Xue Li, Ketai Zhao, Yinxi Gao, Ziyan Zhou, Zhonghui Zhang, Zhibin Wang, Wanchun Dou, Sheng Zhong, Chen Tian, 2026 http://arxiv.org/abs/2601.19278 2. Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding — Hemeng Xia, Zijian Wu, Chunxi Zhang, Yonggan Fu, Haoran Sun, Zhicong Liu, Ping Luo, 2024 https://arxiv.org/abs/2401.07851 3. Fast Inference from Transformers via Speculative Decoding — Yaniv Leviathan, Matan Kalman, Yossi Matias, 2023 https://arxiv.org/abs/2211.17192 4. EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty — Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang, 2024 https://arxiv.org/abs/2401.15077 5. Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion — Jacob K. Christopher, Brian R. Bartoldson, Tal Ben-Nun, Michael Cardei, Bhavya Kailkhura, Ferdinando Fioretto, 2024 https://arxiv.org/abs/2408.05636 6. EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test — Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang, 2025 https://scholar.google.com/scholar?q=EAGLE-3:+Scaling+up+Inference+Acceleration+of+Large+Language+Models+via+Training-Time+Test 7. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads — Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao, 2024 https://scholar.google.com/scholar?q=Medusa:+Simple+LLM+Inference+Acceleration+Framework+with+Multiple+Decoding+Heads 8. DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding — Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, Jun Wang, 2025 https://scholar.google.com/scholar?q=DiffuSpec:+Unlocking+Diffusion+Language+Models+for+Speculative+Decoding 9. SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding — Jameson Sandler, Jacob K. Christopher, Thomas Hartvigsen, Nando Fioretto, 2025 https://scholar.google.com/scholar?q=SpecDiff-2:+Scaling+Diffusion+Drafter+Alignment+For+Faster+Speculative+Decoding 10. Speculative Decoding: Performance or Illusion? — Xiaoxuan Liu, Jiaxiang Yu, Jongseok Park, Ion Stoica, Alvin Cheung, 2026 https://scholar.google.com/scholar?q=Speculative+Decoding:+Performance+or+Illusion? 11. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 12. AI Post Transformers: Deep Kernel Fusion for Transformer Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-deep-kernel-fusion-for-transformer-decod-b1a703.mp3 13. AI Post Transformers: InfiniGen for Efficient Long-Context LLM Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-18-infinigen-for-efficient-long-context-llm-143d77.mp3 14. AI Post Transformers: Batch-Aware Expert Routing for Faster MoE Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-batch-aware-expert-routing-for-faster-mo-683ab6.mp3

Tout afficher (755)

3,7

sur 5

3 notes

Création

mcgrof
Années d’activité

2025 - 2026
Épisodes

755
Classification

Tous publics
Site web de l’émission

AI Post Transformers

Technologies

Technologies

Tous les jours

AI Post Transformers

SiLU Activations for Replay-Free Atari Reinforcement Learning

Swish: Self-Gated Activation Beyond ReLU

Verbalizable Representations and the Global Workspace

Discretizing Reward Models for RL Alignment

Program-as-Weights for Compiling Fuzzy Functions

AIConfigurator for Cross-Framework LLM Serving

Nemotron-TwoTower for Parallel Diffusion Language Modeling

DART Speeds Up Speculative LLM Decoding

Notes et avis

À propos

Informations

Vous aimeriez peut‑être aussi

AI Post Transformers

Épisodes

SiLU Activations for Replay-Free Atari Reinforcement Learning

Swish: Self-Gated Activation Beyond ReLU

Verbalizable Representations and the Global Workspace

Discretizing Reward Models for RL Alignment

Program-as-Weights for Compiling Fuzzy Functions

AIConfigurator for Cross-Framework LLM Serving

Nemotron-TwoTower for Parallel Diffusion Language Modeling

DART Speeds Up Speculative LLM Decoding

Notes et avis

À propos

Informations

Vous aimeriez peut‑être aussi