AI Post Transformers

mcgrof

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

  1. 17h ago

    X-LLM: Treating Multimodalities as Foreign Languages

    This episode explores X-LLM, a 2023 system that treats images, video, and speech as foreign languages a frozen ChatGLM can learn to read through learned modality-to-language bridges. It breaks down the paper’s architecture, including Q-Former-based visual adapters and a separate speech pipeline with continuous integrate-and-fire modules, to show how three sensory routes feed a single dialogue model instead of one end-to-end multimodal transformer. The discussion argues that X-LLM mattered less as proof of a universal multimodal theory than as a practical open-model recipe shaped by 2023 compute limits, with its Chinese-language backbone playing a real methodological role rather than serving as background context. Listeners get a sharp comparison between this bridge-based approach and later end-to-end systems such as GPT-4o and Gemini 1.5, making the episode useful for understanding how modern multimodal assistants actually evolved. Sources: 1. X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages — Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu, 2023 http://arxiv.org/abs/2305.04160 2. Multimodal Few-Shot Learning with Frozen Language Models — Maria Tsimpoukelli, Jacob Menick, Oriol Vinyals, Felix Hill, 2021 https://scholar.google.com/scholar?q=Multimodal+Few-Shot+Learning+with+Frozen+Language+Models 3. Flamingo: a Visual Language Model for Few-Shot Learning — Jean-Baptiste Alayrac, Jeff Donahue, Karen Simonyan, Oriol Vinyals, Andrew Zisserman, 2022 https://scholar.google.com/scholar?q=Flamingo:+a+Visual+Language+Model+for+Few-Shot+Learning 4. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models — Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi, 2023 https://scholar.google.com/scholar?q=BLIP-2:+Bootstrapping+Language-Image+Pre-training+with+Frozen+Image+Encoders+and+Large+Language+Models 5. SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities — Dong Zhang, Shimin Li, Xin Zhang, Xipeng Qiu, 2023 https://scholar.google.com/scholar?q=SpeechGPT:+Empowering+Large+Language+Models+with+Intrinsic+Cross-Modal+Conversational+Abilities 6. Visual Instruction Tuning — Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee, 2023 https://scholar.google.com/scholar?q=Visual+Instruction+Tuning 7. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models — Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny, 2023 https://scholar.google.com/scholar?q=MiniGPT-4:+Enhancing+Vision-Language+Understanding+with+Advanced+Large+Language+Models 8. PaLM-E: An Embodied Multimodal Language Model — Danny Driess et al., 2023 https://scholar.google.com/scholar?q=PaLM-E:+An+Embodied+Multimodal+Language+Model 9. CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition — Linhao Dong, Bo Xu, 2019 https://scholar.google.com/scholar?q=CIF:+Continuous+Integrate-and-Fire+for+End-to-End+Speech+Recognition 10. VL-JEPA: Joint Embedding Predictive Architecture for Vision-language — Delong Chen et al., 2025 https://scholar.google.com/scholar?q=VL-JEPA:+Joint+Embedding+Predictive+Architecture+for+Vision-language 11. TokenPacker: Efficient Visual Projector for Multimodal LLM — Wentong Li et al., 2024 https://scholar.google.com/scholar?q=TokenPacker:+Efficient+Visual+Projector+for+Multimodal+LLM 12. Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM — Donghwan Chi et al., 2025 https://scholar.google.com/scholar?q=Slot-MLLM:+Object-Centric+Visual+Tokenization+for+Multimodal+LLM 13. Auto-Encoding Morph-Tokens for Multimodal LLM — Kaihang Pan et al., 2024 https://scholar.google.com/scholar?q=Auto-Encoding+Morph-Tokens+for+Multimodal+LLM 14. ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention — Wenjie Liu et al., 2026 https://scholar.google.com/scholar?q=ViCA:+Efficient+Multimodal+LLMs+with+Vision-Only+Cross-Attention 15. F-LMM: Grounding Frozen Large Multimodal Models — Size Wu et al., 2024 https://scholar.google.com/scholar?q=F-LMM:+Grounding+Frozen+Large+Multimodal+Models 16. MultiModal-GPT: A Vision and Language Model for Dialogue with Humans — Tao Gong et al., 2023 https://scholar.google.com/scholar?q=MultiModal-GPT:+A+Vision+and+Language+Model+for+Dialogue+with+Humans 17. AI Post Transformers: UniVideo: Unified Video Understanding, Generation, and Editing — Hal Turing & Dr. Ada Shannon, Sat, https://podcast.do-not-panic.com/episodes/univideo-unified-video-understanding-generation-and-editing/ 18. AI Post Transformers: DeepSeek-OCR: Contexts Optical Compression — Hal Turing & Dr. Ada Shannon, Sat, https://podcast.do-not-panic.com/episodes/deepseek-ocr-contexts-optical-compression/ Interactive Visualization: X-LLM: Treating Multimodalities as Foreign Languages

  2. 1d ago

    Building General User Models from Computer Use

    This episode explores the UIST 2025 paper "Creating General User Models from Computer Use," which proposes building a persistent user model from raw computer traces such as screenshots, UI text, message context, and app switching. It explains how the system stores confidence-weighted natural-language propositions about a person’s preferences, knowledge, goals, and current situation, aiming to support cross-application assistants that can help proactively rather than waiting for explicit requests. The discussion situates the idea against earlier recommender systems, Bayesian user modeling, and newer LLM memory architectures, arguing that the paper is most interesting as a synthesis of HCI user modeling and retrieval-and-revision style AI memory. Listeners would find it compelling because it gets concrete about both the upside of more context-aware assistants and the hard problems underneath them, including noisy behavioral data, narrow evidence relative to broad claims, and the privacy risks of inferring things users never said aloud. Sources: 1. Creating General User Models from Computer Use — Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, Michael S. Bernstein, 2025 http://arxiv.org/abs/2505.10831 2. User Modeling via Stereotypes — Elaine Rich, 1979 https://scholar.google.com/scholar?q=User+Modeling+via+Stereotypes 3. The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users — Eric J. Horvitz, John S. Breese, David Heckerman, David Hovel, Koos Rommelse, 1998 https://scholar.google.com/scholar?q=The+Lumiere+Project:+Bayesian+User+Modeling+for+Inferring+the+Goals+and+Needs+of+Software+Users 4. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions — Gediminas Adomavicius, Alexander Tuzhilin, 2005 https://scholar.google.com/scholar?q=Toward+the+Next+Generation+of+Recommender+Systems:+A+Survey+of+the+State-of-the-Art+and+Possible+Extensions 5. User Modeling and User Profiling: A Comprehensive Survey — Erasmo Purificato, Ludovico Boratto, Ernesto William De Luca, 2024 https://scholar.google.com/scholar?q=User+Modeling+and+User+Profiling:+A+Comprehensive+Survey 6. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://scholar.google.com/scholar?q=Generative+Agents:+Interactive+Simulacra+of+Human+Behavior 7. MemGPT: Towards LLMs as Operating Systems — Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez, 2023 https://scholar.google.com/scholar?q=MemGPT:+Towards+LLMs+as+Operating+Systems 8. Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration — Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang, 2024 https://scholar.google.com/scholar?q=Collaborative+Gym:+A+Framework+for+Enabling+and+Evaluating+Human-Agent+Collaboration 9. Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents — Matthew Jörke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, James Landay, 2024 https://scholar.google.com/scholar?q=Supporting+Physical+Activity+Behavior+Change+with+LLM-Based+Conversational+Agents 10. TaskTracer: a desktop environment to support multi-tasking knowledge workers — Anton N. Dragunov, Thomas G. Dietterich, Kevin Johnsrude, Matthew McLaughlin, Lida Li, Jonathan L. Herlocker, 2005 https://scholar.google.com/scholar?q=TaskTracer:+a+desktop+environment+to+support+multi-tasking+knowledge+workers 11. UI-TARS: Pioneering Automated GUI Interaction with Native Agents — Yujia Qin et al., 2025 https://arxiv.org/abs/2501.12326 12. Need Help? Designing Proactive AI Assistants for Programming — Valerie Chen et al., 2024/2025 https://arxiv.org/abs/2410.04596 13. Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants — Deepak Nathani et al., 2026 https://arxiv.org/abs/2604.00842 14. ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation — Jiho Kim et al., 2025/2026 https://arxiv.org/abs/2509.21730 15. AI Post Transformers: Learning Latent Action World Models from Video — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-learning-latent-action-world-models-from-1570a4.mp3 16. AI Post Transformers: PaperBench: Can AI Replicate AI Research? — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-17-paperbench-can-ai-replicate-ai-research-862944.mp3 17. AI Post Transformers: Reasoning Theater and Unfaithful Chain-of-Thought — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-reasoning-theater-and-unfaithful-chain-o-a4507e.mp3 18. AI Post Transformers: Neural Computers as Learned Latent Runtimes — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-11-neural-computers-as-learned-latent-runti-9fa282.mp3 Interactive Visualization: Building General User Models from Computer Use

  3. 1d ago

    EMO: Emergent Modularity for Mixture-of-Experts

    Hal Turing and Dr. Ada Shannon take a deep dive into EMO: Pretraining Mixture of Experts for Emergent Modularity, a May 7, 2026 paper by Ryan Wang and co-authors from UC Berkeley and the Allen Institute for AI. The episode centers on a practical deployment question: if a workload is mostly code, math, or biomed, why must operators keep an entire giant model in memory instead of loading only the relevant slice? They frame EMO against the broader rise of sparse Mixture-of-Experts systems and explain why industry progress on active-parameter efficiency is not the same as delivering clean, domain-specific modules that can stand on their own at inference time. The discussion carefully separates standard MoE behavior from the stronger notion of modularity that EMO is targeting. Hal and Ada walk through how sparse-gated MoE and Switch Transformer style routing already allow different tokens to activate different experts, but argue that this still leaves deployment looking monolithic because the router makes local token-level decisions rather than exposing stable task-level components. A biology prompt can still scatter across a messy set of experts, and the next sentence may hit a different set entirely. The hosts use that distinction to unpack the paper’s core concepts: emergent modularity from unlabeled data, semantic expert specialization around meaningful domains like code or math, composable architecture, and the memory-accuracy frontier that determines whether smaller loaded expert pools can preserve real capability. The episode then gets into EMO’s training design and why the method is more than a single routing tweak. Ada explains the paper’s two-level routing scheme, where a document first selects a shared candidate pool of experts and individual tokens then choose active experts only within that pool, forcing document-consistent structure without removing all local flexibility. They also cover the supporting recipe: random pool-size sampling to expose the model to different memory budgets during training, global load balancing so a few experts do not dominate usage, and document-length-aware training so very long documents do not overwhelm the learning signal. The result is a focused discussion of whether MoE pretraining can produce expert groups that are not just sparsely activated, but genuinely deployable as modular tools. Sources: 1. EMO: Emergent Modularity for Mixture-of-Experts https://allenai.org/papers/emo 2. DEMix Layers: Disentangling Domains for Modular Language Modeling — Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer, 2021 https://scholar.google.com/scholar?q=DEMix+Layers:+Disentangling+Domains+for+Modular+Language+Modeling 3. ModuleFormer: Modularity Emerges from Mixture-of-Experts — Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan, 2023 https://scholar.google.com/scholar?q=ModuleFormer:+Modularity+Emerges+from+Mixture-of-Experts 4. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models — Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You, 2024 https://scholar.google.com/scholar?q=OpenMoE:+An+Early+Effort+on+Open+Mixture-of-Experts+Language+Models 5. EMO: Pretraining Mixture of Experts for Emergent Modularity — Ryan Wang, Akshita Bhagia, Sewon Min, 2026 https://scholar.google.com/scholar?q=EMO:+Pretraining+Mixture+of+Experts+for+Emergent+Modularity 6. FlexOlmo: Open Language Models for Flexible Data Use — Weijia Shi et al., 2025 https://scholar.google.com/scholar?q=FlexOlmo:+Open+Language+Models+for+Flexible+Data+Use 7. Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM — Sainbayar Sukhbaatar et al., 2024 https://scholar.google.com/scholar?q=Branch-Train-MiX:+Mixing+Expert+LLMs+into+a+Mixture-of-Experts+LLM 8. The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise — Xi Wang, Soufiane Hayou, Eric Nalisnick, 2026 https://scholar.google.com/scholar?q=The+Myth+of+Expert+Specialization+in+MoEs:+Why+Routing+Reflects+Geometry,+Not+Necessarily+Domain+Expertise 9. Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations — Zican Dong et al., 2025 https://scholar.google.com/scholar?q=Domain-Specific+Pruning+of+Large+Mixture-of-Experts+Models+with+Few-shot+Demonstrations 10. Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs — Enshu Liu et al., 2024 https://scholar.google.com/scholar?q=Efficient+Expert+Pruning+for+Sparse+Mixture-of-Experts+Language+Models:+Enhancing+Performance+and+Reducing+Inference+Costs 11. MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router — Yanyue Xie et al., 2024 https://scholar.google.com/scholar?q=MoE-Pruner:+Pruning+Mixture-of-Experts+Large+Language+Model+using+the+Hints+from+Its+Router 12. AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models — Zihao Zeng et al., 2024 https://scholar.google.com/scholar?q=AdaMoE:+Token-Adaptive+Routing+with+Null+Experts+for+Mixture-of-Experts+Language+Models 13. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models — Damai Dai et al., 2024 https://scholar.google.com/scholar?q=DeepSeekMoE:+Towards+Ultimate+Expert+Specialization+in+Mixture-of-Experts+Language+Models 14. Fast Inference of Mixture-of-Experts Language Models with Offloading — Artyom Eliseev and Denis Mazur, 2023 https://scholar.google.com/scholar?q=Fast+Inference+of+Mixture-of-Experts+Language+Models+with+Offloading 15. Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding — Zhibin Wang et al., 2025 https://scholar.google.com/scholar?q=Accelerating+Mixture-of-Experts+Inference+by+Hiding+Offloading+Latency+with+Speculative+Decoding 16. AI Post Transformers: EMO: Emergent Modularity in Sparse Language Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-06-emo-emergent-modularity-in-sparse-langua-9551c4.mp3 17. AI Post Transformers: Batch-Aware Expert Routing for Faster MoE Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-batch-aware-expert-routing-for-faster-mo-683ab6.mp3 18. AI Post Transformers: Nemotron 3 Super Hybrid Mamba-Transformer MoE — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-19-nemotron-3-super-hybrid-mamba-transforme-31ac75.mp3 19. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3 Interactive Visualization: EMO: Emergent Modularity for Mixture-of-Experts

  4. 1d ago

    Fine-Tuning LLMs for Human Behavior Prediction

    This episode explores a 2025 study on fine-tuning large language models to predict how people respond in social science experiments, asking whether trained models can simulate new studies more reliably than prompting alone. It explains how the researchers built SOCSCI210, a dataset of 2.9 million responses from more than 400,000 participants across 210 TESS experiments, and why standardizing those studies into respondent-condition-question-answer records is central to the method. The discussion breaks down the paper’s evaluation criteria, including out-of-distribution generalization, distribution matching via Wasserstein distance, normalized individual accuracy, and treatment-effect recovery, to show the difference between sounding plausible and preserving real experimental patterns. Listeners would find it interesting because it treats LLMs not as chatbots but as possible “wind tunnels” for testing study designs in advance, while also confronting the risk that a convincing simulator could still get causal effects wrong. Sources: 1. Finetuning LLMs for Human Behavior Prediction in Social Science Experiments — Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein, 2025 http://arxiv.org/abs/2509.05830 2. Out of One, Many: Using Language Models to Simulate Human Samples — Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate, 2022 https://scholar.google.com/scholar?q=Out+of+One,+Many:+Using+Language+Models+to+Simulate+Human+Samples 3. LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals — Joon Sung Park, Carolyn Q. Zou, Jonne Kamphorst, Niles Egan, Aaron Shaw, Michael S. Bernstein, et al., 2024 https://scholar.google.com/scholar?q=LLM+Agents+Grounded+in+Self-Reports+Enable+General-Purpose+Simulation+of+Individuals 4. Large Language Models Show Human-like Social Desirability Biases in Survey Responses — Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, Joao Sedoc, Lyle H. Ungar, Johannes C. Eichstaedt, 2024 https://scholar.google.com/scholar?q=Large+Language+Models+Show+Human-like+Social+Desirability+Biases+in+Survey+Responses 5. Finetuning LLMs for Human Behavior Prediction in Social Science Experiments — Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein, 2025 https://scholar.google.com/scholar?q=Finetuning+LLMs+for+Human+Behavior+Prediction+in+Social+Science+Experiments 6. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies — Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai, 2022 https://scholar.google.com/scholar?q=Using+Large+Language+Models+to+Simulate+Multiple+Humans+and+Replicate+Human+Subject+Studies 7. Can Large Language Models Replace Human Subjects? A Large-Scale Replication of Scenario-Based Experiments in Psychology and Management — Ziyan Cui, Ning Li, Huaikang Zhou, 2024 https://scholar.google.com/scholar?q=Can+Large+Language+Models+Replace+Human+Subjects?+A+Large-Scale+Replication+of+Scenario-Based+Experiments+in+Psychology+and+Management 8. Using Large Language Models to Create AI Personas for Replication, Generalization and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings — Leo Yeykelis, Kaavya Pichai, James J. Cummings, Byron Reeves, 2024 https://scholar.google.com/scholar?q=Using+Large+Language+Models+to+Create+AI+Personas+for+Replication,+Generalization+and+Prediction+of+Media+Effects:+An+Empirical+Test+of+133+Published+Experimental+Research+Findings 9. This human study did not involve human subjects: Validating LLM simulations as behavioral evidence — Jessica Hullman, David Broska, Huaman Sun, Aaron Shaw, 2026 https://scholar.google.com/scholar?q=This+human+study+did+not+involve+human+subjects:+Validating+LLM+simulations+as+behavioral+evidence 10. Centaur: a Foundation Model of Human Cognition — Marcel Binz et al., 2024 https://scholar.google.com/scholar?q=Centaur:+a+Foundation+Model+of+Human+Cognition 11. Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions — Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang, 2025 https://scholar.google.com/scholar?q=Language+Model+Fine-Tuning+on+Scaled+Survey+Data+for+Predicting+Distributions+of+Public+Opinions 12. Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions — Matthias Orlikowski, Jiaxin Pei, Paul Rottger, Philipp Cimiano, David Jurgens, Dirk Hovy, 2025 https://scholar.google.com/scholar?q=Beyond+Demographics:+Fine-tuning+Large+Language+Models+to+Predict+Individuals'+Subjective+Text+Perceptions 13. Large Language Models that Replace Human Participants Can Harmfully Misportray and Flatten Identity Groups — Angelina Wang, Jamie Morgenstern, John P. Dickerson, 2025 https://scholar.google.com/scholar?q=Large+Language+Models+that+Replace+Human+Participants+Can+Harmfully+Misportray+and+Flatten+Identity+Groups 14. Beyond Believability: Accurate Human Behavior Simulation with Fine-Tuned LLMs — Yuxuan Lu et al., 2025 https://scholar.google.com/scholar?q=Beyond+Believability:+Accurate+Human+Behavior+Simulation+with+Fine-Tuned+LLMs 15. The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models — Marlene Lutz et al., 2025 https://arxiv.org/abs/2507.16076 16. Prompt Fairness: Sub-group Disparities in LLMs — Meiyu Zhong, Noel Teku, Ravi Tandon, 2025 https://arxiv.org/abs/2511.19956 17. Can Persona-Prompted LLMs Emulate Subgroup Values? An Empirical Analysis of Generalisability and Fairness in Cultural Alignment — Bryan Chen Zhengyu Tan et al., 2026 https://arxiv.org/abs/2604.12851 18. Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM — Zizhao Hu, Mohammad Rostami, Jesse Thomason, 2026 https://arxiv.org/abs/2603.18507 19. Causality for Large Language Models — Anpeng Wu et al., 2024 https://arxiv.org/abs/2410.15319 20. AI Post Transformers: PaperBench: Can AI Replicate AI Research? — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-17-paperbench-can-ai-replicate-ai-research-862944.mp3 21. AI Post Transformers: When Many-Shot CoT Becomes Test-Time Learning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-when-many-shot-cot-becomes-test-time-lea-c25bfe.mp3 Interactive Visualization: Fine-Tuning LLMs for Human Behavior Prediction

  5. 1d ago

    Modeling Financial Habits with Transaction Transformers

    This episode explores how a transformer trained on raw bank transaction histories can model customer behavior for financial product recommendation, and why that may outperform pipelines built from hand-engineered tabular features alone. It explains the paper’s core idea of turning each transaction into a tokenized sequence that mixes inflow or outflow, amount buckets, calendar signals, source metadata, and natural-language merchant descriptions, then pretraining the model with self-supervised learning to produce reusable customer embeddings. The discussion argues that transaction text and long-range patterns such as pay cycles, bill timing, and abrupt behavior changes carry signal that conventional tabular systems often flatten away, while a practical deployment can still combine learned embeddings with legacy banking features downstream. A listener would find it interesting because it connects transformer-style representation learning to a concrete banking use case and shows how foundation-model ideas can be adapted to messy, real-world financial behavior. Sources: 1. Your Spending Needs Attention: Modeling Financial Habits with Transformers — D. T. Braithwaite, Misael Cavalcanti, R. Austin McEver, Hiroto Udagawa, Daniel Silva, Rohan Ramanath, Felipe Meneses, Arissa Yoshida, Evan Wingert, Matheus Ramos, Brian Zanfelice, Aman Gupta, 2025 http://arxiv.org/abs/2507.23267 2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding — Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019 https://scholar.google.com/scholar?q=BERT:+Pre-training+of+Deep+Bidirectional+Transformers+for+Language+Understanding 3. CoLES: Contrastive Learning for Event Sequences with Self-Supervision — Dmitrii Babaev, Ivan Kireev, Nikita Ovsov, Mariya Ivanova, Gleb Gusev, Ivan Nazarov, Alexander Tuzhilin, 2020 https://scholar.google.com/scholar?q=CoLES:+Contrastive+Learning+for+Event+Sequences+with+Self-Supervision 4. Dynamic Customer Embeddings for Financial Service Applications — Nima Chitsazan, Samuel Sharpe, Dwipam Katariya, Qianyu Cheng, Karthik Rajasethupathy, 2021 https://scholar.google.com/scholar?q=Dynamic+Customer+Embeddings+for+Financial+Service+Applications 5. Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences — Piotr Skalski, David Sutton, Stuart Burrell, Iker Perez, Jason Wong, 2024 https://scholar.google.com/scholar?q=Towards+a+Foundation+Purchasing+Model:+Pretrained+Generative+Autoregression+on+Transaction+Sequences 6. Self-Attentive Sequential Recommendation — Wang-Cheng Kang, Julian McAuley, 2018 https://scholar.google.com/scholar?q=Self-Attentive+Sequential+Recommendation 7. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer — Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, Peng Jiang, 2019 https://scholar.google.com/scholar?q=BERT4Rec:+Sequential+Recommendation+with+Bidirectional+Encoder+Representations+from+Transformer 8. S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization — Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen, 2020 https://scholar.google.com/scholar?q=S^3-Rec:+Self-Supervised+Learning+for+Sequential+Recommendation+with+Mutual+Information+Maximization 9. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba — Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, Wenwu Ou, 2019 https://scholar.google.com/scholar?q=Behavior+Sequence+Transformer+for+E-commerce+Recommendation+in+Alibaba 10. Text Is All You Need: Learning Language Representations for Sequential Recommendation — Jiacheng Li, Ming Wang, Jin Li, Jinmiao Fu, Xin Shen, Jingbo Shang, Julian McAuley, 2023 https://scholar.google.com/scholar?q=Text+Is+All+You+Need:+Learning+Language+Representations+for+Sequential+Recommendation 11. PinnerFormer: Sequence Modeling for User Representation at Pinterest — Nikil Pancha, Andrew Zhai, Jure Leskovec, Charles Rosenberg, 2022 https://scholar.google.com/scholar?q=PinnerFormer:+Sequence+Modeling+for+User+Representation+at+Pinterest 12. Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models — Chengkai Liu et al., 2024 https://scholar.google.com/scholar?q=Mamba4Rec:+Towards+Efficient+Sequential+Recommendation+with+Selective+State+Space+Models 13. SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation — Haohao Qu et al., 2024 https://scholar.google.com/scholar?q=SSD4Rec:+A+Structured+State+Space+Duality+Model+for+Efficient+Sequential+Recommendation 14. DynLLM: When Large Language Models Meet Dynamic Graph Recommendation — Ziwei Zhao et al., 2024 https://scholar.google.com/scholar?q=DynLLM:+When+Large+Language+Models+Meet+Dynamic+Graph+Recommendation 15. Personalized Elastic Embedding Learning for On-Device Recommendation — Ruiqi Zheng et al., 2023 https://scholar.google.com/scholar?q=Personalized+Elastic+Embedding+Learning+for+On-Device+Recommendation 16. A Survey on Deep Tabular Learning — Shriyank Somvanshi et al., 2024 https://scholar.google.com/scholar?q=A+Survey+on+Deep+Tabular+Learning 17. AI Post Transformers: KumoRFM for In-Context Relational Learning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-11-kumorfm-for-in-context-relational-learni-520d2b.mp3 18. AI Post Transformers: Gated Linear Attention for Efficient Long Sequences — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-18-gated-linear-attention-for-efficient-lon-c858ab.mp3

  6. 1d ago

    Simulating Individuals with Self-Reported LLM Agents

    This episode explores a paper on building reusable LLM-based simulations of specific individuals by grounding agents in people’s own interviews, survey responses, or both, rather than relying on thin demographic personas. It explains how the system was tested on 1,052 Americans using holdout evaluations across survey questions, personality traits, behavioral experiments, and randomized intervention outcomes to measure real generalization instead of simple recall. The discussion highlights the main result that self-report-grounded agents performed much better than demographics-only baselines, with combined interview-and-survey agents reaching 86 percent of a person’s own two-week consistency versus 74 percent for demographics alone. It is interesting because it frames these agents as a possible new tool for social science and policy research while also probing hard questions about fairness, stereotype reduction, and whether strong results on language-based self-reports truly amount to deep behavior simulation. Sources: 1. LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals — Joon Sung Park, Carolyn Q. Zou, Jonne Kamphorst, Niles Egan, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Percy Liang, Robb Willer, Michael S. Bernstein, 2024 http://arxiv.org/abs/2411.10109 2. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://arxiv.org/abs/2304.03442 3. Out of One, Many: Using Language Models to Simulate Human Samples — Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate, 2022 https://arxiv.org/abs/2209.06899 4. Generative Agent Simulations of 1,000 People — Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, Michael S. Bernstein, 2024 https://arxiv.org/abs/2411.10109 5. AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction — Junsol Kim, Byungkyu Lee, 2023 https://arxiv.org/abs/2305.09620 6. Large Language Models Show Human-like Social Desirability Biases in Survey Responses — Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, Joao Sedoc, Lyle H. Ungar, Johannes C. Eichstaedt, 2024 https://arxiv.org/abs/2405.06058 7. Interview-Informed Generative Agents for Product Discovery: A Validation Study — Zichao Wang, Alexa Siu, 2026 https://arxiv.org/abs/2603.29890 8. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? — John J. Horton, Apostolos Filippas, Benjamin S. Manning, 2023 https://arxiv.org/abs/2301.07543 9. From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents — Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, et al., 2024 https://arxiv.org/abs/2412.03563 10. Using Large Language Models to Create AI Personas for Replication, Generalization and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings — Leo Yeykelis, Kaavya Pichai, James J. Cummings, Byron Reeves, 2024 https://arxiv.org/abs/2408.16073 11. Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations — Yu Lu Liu, Hyokun Yun, Tanya Roosta, Ziang Xiao, 2026 https://arxiv.org/abs/2605.02624 12. Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach — Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz, 2017 https://arxiv.org/abs/1705.08038 13. Is ChatGPT a Good Personality Recognizer? A Preliminary Study — Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He, 2023 https://arxiv.org/abs/2307.03952 14. Can LLMs Infer Personality from Real World Conversations? — Jianfeng Zhu, Ruoming Jin, Karin G. Coifman, 2025 https://arxiv.org/abs/2507.14355 15. The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs — Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez, 2025 https://arxiv.org/abs/2509.03730 16. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models — Myra Cheng, Esin Durmus, Dan Jurafsky, 2023 https://arxiv.org/abs/2305.18189 17. On the steerability of large language models toward data-driven personas — Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, Aram Galstyan, Richard Zemel, Rahul Gupta, 2023 https://arxiv.org/abs/2311.04978 18. Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization — Vera Neplenbroek, Arianna Bisazza, Raquel Fernandez, 2025 https://arxiv.org/abs/2505.16467 19. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies — Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai, 2022 https://scholar.google.com/scholar?q=Using+Large+Language+Models+to+Simulate+Multiple+Humans+and+Replicate+Human+Subject+Studies 20. Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias — S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, A. Leiserowitz, 2023 https://scholar.google.com/scholar?q=Can+Large+Language+Models+Capture+Public+Opinion+about+Global+Warming?+An+Empirical+Assessment+of+Algorithmic+Fidelity+and+Bias 21. How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation — Rui Li, Heming Xia, Xinfeng Yuan, Qingxiu Dong, Lei Sha, Wenjie Li, Zhifang Sui, 2025 https://scholar.google.com/scholar?q=How+Far+are+LLMs+from+Being+Our+Digital+Twins?+A+Benchmark+for+Persona-Based+Behavior+Chain+Simulation 22. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale — Bowen Jiang et al., 2025 https://scholar.google.com/scholar?q=Know+Me,+Respond+to+Me:+Benchmarking+LLMs+for+Dynamic+User+Profiling+and+Personalized+Responses+at+Scale 23. PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence — Yunxiao Shi et al., 2025 https://scholar.google.com/scholar?q=PersonaX:+A+Recommendation+Agent+Oriented+User+Modeling+Framework+for+Long+Behavior+Sequence 24. Finetuning LLMs for Human Behavior Prediction in Social Science Experiments — Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein, 2025 https://scholar.google.com/scholar?q=Finetuning+LLMs+for+Human+Behavior+Prediction+in+Social+Science+Experiments 25. Tuning Language Models for Robust Prediction of Diverse User Behaviors — Fanjin Meng et al., 2025 https://scholar.google.com/scholar?q=Tuning+Language+Models+for+Robust+Prediction+of+Diverse+User+Behaviors 26. AI Post Transformers: RAGEN-2: Reasoning Collapse in Agentic RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-07-ragen-2-reasoning-collapse-in-agentic-rl-3cfa0b.mp3 27. AI Post Transformers: Split Personality Training Reveals Latent Knowledge — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-split-personality-training-reveals-laten-c84616.mp3 28. AI Post Transformers: End-to-End Context Compression at Scale — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-10-end-to-end-context-compression-at-scale-278c70.mp3 Interactive Visualization: Simulating Individuals with Self-Reported LLM Agents

  7. 1d ago

    Social Simulacra for Prototyping Online Communities

    This episode explores Social Simulacra, a method for using large language models to prototype entire online communities before they exist by generating synthetic members, posts, and reply threads from a community goal, rules, and a small set of seed personas. It explains why that matters for social computing: small pilots can miss emergent failures like norm drift, newcomer enculturation problems, trolling, and moderator overload, while a populated simulation can expose those dynamics much earlier. The discussion breaks down how the paper uses prompt chaining to scale a handful of personas into a larger Reddit-like population and then tests interventions such as comment removal, warnings, and rule restatements. It also argues that human-sounding text is a low bar, and that the real challenge is whether these simulations capture believable long-term behavior, incentives, and feedback loops well enough to inform product design. Sources: 1. Social Simulacra: Creating Populated Prototypes for Social Computing Systems — Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2022 http://arxiv.org/abs/2208.04024 2. Two Case Studies of Experience Prototyping Machine Learning Systems in the Wild — Qian Yang, 2019 https://scholar.google.com/scholar?q=Two+Case+Studies+of+Experience+Prototyping+Machine+Learning+Systems+in+the+Wild 3. Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools — Stephan Schlogl, Gavin Doherty, Saturnino Luz, 2014 https://scholar.google.com/scholar?q=Wizard+of+Oz+Experimentation+for+Language+Technology+Applications:+Challenges+and+Tools 4. Social Simulacra: Creating Populated Prototypes for Social Computing Systems — Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2022 https://scholar.google.com/scholar?q=Social+Simulacra:+Creating+Populated+Prototypes+for+Social+Computing+Systems 5. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://scholar.google.com/scholar?q=Generative+Agents:+Interactive+Simulacra+of+Human+Behavior 6. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies — Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai, 2022 https://scholar.google.com/scholar?q=Using+Large+Language+Models+to+Simulate+Multiple+Humans+and+Replicate+Human+Subject+Studies 7. Out of One, Many: Using Language Models to Simulate Human Samples — Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate, 2022 https://scholar.google.com/scholar?q=Out+of+One,+Many:+Using+Language+Models+to+Simulate+Human+Samples 8. From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents — Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, Zhongyu Wei, 2024 https://scholar.google.com/scholar?q=From+Individual+to+Society:+A+Survey+on+Social+Simulation+Driven+by+Large+Language+Model-based+Agents 9. SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice — Mohit Singhal, Chen Ling, Pujan Paudel, Poojitha Thota, Nihal Kumarswamy, Gianluca Stringhini, Shirin Nilizadeh, 2022 https://scholar.google.com/scholar?q=SoK:+Content+Moderation+in+Social+Media,+from+Guidelines+to+Enforcement,+and+Research+to+Practice 10. ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules — Jean Y. Song, Sangwook Lee, Jisoo Lee, Mina Kim, Juho Kim, 2022 https://scholar.google.com/scholar?q=ModSandbox:+Facilitating+Online+Community+Moderation+Through+Error+Prediction+and+Improvement+of+Automated+Rules 11. Shaping Online Dialogue: Examining How Community Rules Affect Discussion Structures on Reddit — Anna Fang, Wenjie Yang, Haiyi Zhu, 2023 https://scholar.google.com/scholar?q=Shaping+Online+Dialogue:+Examining+How+Community+Rules+Affect+Discussion+Structures+on+Reddit 12. Post Guidance for Online Communities — Manoel Horta Ribeiro, Robert West, Ryan Lewis, Sanjay Kairam, 2024 https://scholar.google.com/scholar?q=Post+Guidance+for+Online+Communities 13. Piggyback prototyping: Using existing, large-scale social computing systems to prototype new ones — Catherine Grevet and Eric Gilbert, 2015 https://scholar.google.com/scholar?q=Piggyback+prototyping:+Using+existing,+large-scale+social+computing+systems+to+prototype+new+ones 14. The Internet's Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales — Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert, 2018 https://scholar.google.com/scholar?q=The+Internet's+Hidden+Rules:+An+Empirical+Study+of+Reddit+Norm+Violations+at+Micro,+Meso,+and+Macro+Scales 15. Surviving an "Eternal September": How an Online Community Managed a Surge of Newcomers — Charles Kiene, Andres Monroy-Hernandez, and Benjamin Mako Hill, 2016 https://scholar.google.com/scholar?q=Surviving+an+"Eternal+September":+How+an+Online+Community+Managed+a+Surge+of+Newcomers 16. Building Successful Online Communities: Evidence-Based Social Design — Robert E. Kraut and Paul Resnick, 2012 https://scholar.google.com/scholar?q=Building+Successful+Online+Communities:+Evidence-Based+Social+Design 17. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market — Matthew J. Salganik, Peter Sheridan Dodds, and Duncan J. Watts, 2006 https://scholar.google.com/scholar?q=Experimental+Study+of+Inequality+and+Unpredictability+in+an+Artificial+Cultural+Market 18. Measuring the Prevalence of Anti-Social Behavior in Online Communities — Joon Sung Park, Joseph Seering, and Michael S. Bernstein, 2022 https://scholar.google.com/scholar?q=Measuring+the+Prevalence+of+Anti-Social+Behavior+in+Online+Communities 19. LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models — Ivar Frisch and Mario Giulianelli, 2024 https://scholar.google.com/scholar?q=LLM+Agents+in+Interaction:+Measuring+Personality+Consistency+and+Linguistic+Alignment+in+Interacting+Populations+of+Large+Language+Models 20. Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation — Sola Kim, Dongjune Chang, Jieshu Wang, 2025 https://scholar.google.com/scholar?q=Persona+Alchemy:+Designing,+Evaluating,+and+Implementing+Psychologically-Grounded+LLM+Agents+for+Diverse+Stakeholder+Representation 21. Mind the Sim2Real Gap in User Simulation for Agentic Tasks — Xuhui Zhou et al., 2026 https://scholar.google.com/scholar?q=Mind+the+Sim2Real+Gap+in+User+Simulation+for+Agentic+Tasks 22. Computational Turing Test Reveals Systematic Differences Between Human and AI Language — Nicolo Pagan, Petter Tornberg, Christopher A. Bail, Aniko Hannak, Christopher Barrie, 2025 https://scholar.google.com/scholar?q=Computational+Turing+Test+Reveals+Systematic+Differences+Between+Human+and+AI+Language 23. Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges — Patrick Taillandier et al., 2025 https://scholar.google.com/scholar?q=Integrating+LLM+in+Agent-Based+Social+Simulation:+Opportunities+and+Challenges 24. LLM Social Simulations Are a Promising Research Method — Jacy Reese Anthis et al., 2025 https://scholar.google.com/scholar?q=LLM+Social+Simulations+Are+a+Promising+Research+Method 25. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 Interactive Visualization: Social Simulacra for Prototyping Online Communities

  8. 1d ago

    Stable Deep RL via Gaussian Representations

    This episode explores a 2026 paper on stabilizing deep reinforcement learning by pushing an agent’s hidden representations toward an isotropic Gaussian shape. It explains how nonstationarity in RL, from shifting data distributions, bootstrapped targets, and primacy bias, can make agents overfit early experience, lose plasticity, and accumulate dormant neurons. The discussion focuses on the paper’s core argument that a round, evenly used feature space makes linear readouts easier to keep tracking as targets drift, reducing collapse and improving adaptation, and it breaks down SIGReg as a lightweight way to enforce that geometry. Listeners would find it interesting because it links an abstract idea from representation geometry to a concrete engineering problem in making RL systems more stable and trainable. Sources: 1. Stable Deep Reinforcement Learning via Isotropic Gaussian Representations — Ali Saheb Pasand, Johan Obando-Ceron, Aaron Courville, Pouya Bashivan, Pablo Samuel Castro, 2026 http://arxiv.org/abs/2602.19373 2. Whitening for Self-Supervised Representation Learning — Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe, 2020 https://scholar.google.com/scholar?q=Whitening+for+Self-Supervised+Representation+Learning 3. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning — Adrien Bardes, Jean Ponce, Yann LeCun, 2021 https://scholar.google.com/scholar?q=VICReg:+Variance-Invariance-Covariance+Regularization+for+Self-Supervised+Learning 4. The Dormant Neuron Phenomenon in Deep Reinforcement Learning — Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci, 2023 https://scholar.google.com/scholar?q=The+Dormant+Neuron+Phenomenon+in+Deep+Reinforcement+Learning 5. LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics — Randall Balestriero, Yann LeCun, 2025 https://scholar.google.com/scholar?q=LeJEPA:+Provable+and+Scalable+Self-Supervised+Learning+Without+the+Heuristics 6. The Primacy Bias in Deep Reinforcement Learning — Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville, 2022 https://scholar.google.com/scholar?q=The+Primacy+Bias+in+Deep+Reinforcement+Learning 7. No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO — Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre, 2024 https://scholar.google.com/scholar?q=No+Representation,+No+Trust:+Connecting+Representation,+Collapse,+and+Trust+Issues+in+PPO 8. Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning — Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro, 2025 https://scholar.google.com/scholar?q=Stable+Gradients+for+Stable+Learning+at+Scale+in+Deep+Reinforcement+Learning 9. Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks — Jesse Farebrother et al., 2023 https://scholar.google.com/scholar?q=Proto-Value+Networks:+Scaling+Representation+Learning+with+Auxiliary+Tasks 10. Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics — Raphael Bernas et al., 2026 https://scholar.google.com/scholar?q=Revisiting+Anisotropy+in+Language+Transformers:+The+Geometry+of+Learning+Dynamics 11. Emergence of Quantised Representations Isolated to Anisotropic Functions — George Bird, 2025 https://scholar.google.com/scholar?q=Emergence+of+Quantised+Representations+Isolated+to+Anisotropic+Functions 12. AI Post Transformers: When Spectral Gradient Updates Help Deep Learning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-when-spectral-gradient-updates-help-deep-9c8441.mp3 13. AI Post Transformers: Muon Is Scalable for LLM Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-muon-is-scalable-for-llm-training-587ed8.mp3

Ratings & Reviews

3.7
out of 5
3 Ratings

About

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

You Might Also Like