This episode explores a paper on building reusable LLM-based simulations of specific individuals by grounding agents in people’s own interviews, survey responses, or both, rather than relying on thin demographic personas. It explains how the system was tested on 1,052 Americans using holdout evaluations across survey questions, personality traits, behavioral experiments, and randomized intervention outcomes to measure real generalization instead of simple recall. The discussion highlights the main result that self-report-grounded agents performed much better than demographics-only baselines, with combined interview-and-survey agents reaching 86 percent of a person’s own two-week consistency versus 74 percent for demographics alone. It is interesting because it frames these agents as a possible new tool for social science and policy research while also probing hard questions about fairness, stereotype reduction, and whether strong results on language-based self-reports truly amount to deep behavior simulation. Sources: 1. LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals — Joon Sung Park, Carolyn Q. Zou, Jonne Kamphorst, Niles Egan, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Percy Liang, Robb Willer, Michael S. Bernstein, 2024 http://arxiv.org/abs/2411.10109 2. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://arxiv.org/abs/2304.03442 3. Out of One, Many: Using Language Models to Simulate Human Samples — Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate, 2022 https://arxiv.org/abs/2209.06899 4. Generative Agent Simulations of 1,000 People — Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, Michael S. Bernstein, 2024 https://arxiv.org/abs/2411.10109 5. AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction — Junsol Kim, Byungkyu Lee, 2023 https://arxiv.org/abs/2305.09620 6. Large Language Models Show Human-like Social Desirability Biases in Survey Responses — Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, Joao Sedoc, Lyle H. Ungar, Johannes C. Eichstaedt, 2024 https://arxiv.org/abs/2405.06058 7. Interview-Informed Generative Agents for Product Discovery: A Validation Study — Zichao Wang, Alexa Siu, 2026 https://arxiv.org/abs/2603.29890 8. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? — John J. Horton, Apostolos Filippas, Benjamin S. Manning, 2023 https://arxiv.org/abs/2301.07543 9. From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents — Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, et al., 2024 https://arxiv.org/abs/2412.03563 10. Using Large Language Models to Create AI Personas for Replication, Generalization and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings — Leo Yeykelis, Kaavya Pichai, James J. Cummings, Byron Reeves, 2024 https://arxiv.org/abs/2408.16073 11. Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations — Yu Lu Liu, Hyokun Yun, Tanya Roosta, Ziang Xiao, 2026 https://arxiv.org/abs/2605.02624 12. Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach — Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz, 2017 https://arxiv.org/abs/1705.08038 13. Is ChatGPT a Good Personality Recognizer? A Preliminary Study — Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He, 2023 https://arxiv.org/abs/2307.03952 14. Can LLMs Infer Personality from Real World Conversations? — Jianfeng Zhu, Ruoming Jin, Karin G. Coifman, 2025 https://arxiv.org/abs/2507.14355 15. The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs — Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez, 2025 https://arxiv.org/abs/2509.03730 16. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models — Myra Cheng, Esin Durmus, Dan Jurafsky, 2023 https://arxiv.org/abs/2305.18189 17. On the steerability of large language models toward data-driven personas — Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, Aram Galstyan, Richard Zemel, Rahul Gupta, 2023 https://arxiv.org/abs/2311.04978 18. Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization — Vera Neplenbroek, Arianna Bisazza, Raquel Fernandez, 2025 https://arxiv.org/abs/2505.16467 19. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies — Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai, 2022 https://scholar.google.com/scholar?q=Using+Large+Language+Models+to+Simulate+Multiple+Humans+and+Replicate+Human+Subject+Studies 20. Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias — S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, A. Leiserowitz, 2023 https://scholar.google.com/scholar?q=Can+Large+Language+Models+Capture+Public+Opinion+about+Global+Warming?+An+Empirical+Assessment+of+Algorithmic+Fidelity+and+Bias 21. How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation — Rui Li, Heming Xia, Xinfeng Yuan, Qingxiu Dong, Lei Sha, Wenjie Li, Zhifang Sui, 2025 https://scholar.google.com/scholar?q=How+Far+are+LLMs+from+Being+Our+Digital+Twins?+A+Benchmark+for+Persona-Based+Behavior+Chain+Simulation 22. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale — Bowen Jiang et al., 2025 https://scholar.google.com/scholar?q=Know+Me,+Respond+to+Me:+Benchmarking+LLMs+for+Dynamic+User+Profiling+and+Personalized+Responses+at+Scale 23. PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence — Yunxiao Shi et al., 2025 https://scholar.google.com/scholar?q=PersonaX:+A+Recommendation+Agent+Oriented+User+Modeling+Framework+for+Long+Behavior+Sequence 24. Finetuning LLMs for Human Behavior Prediction in Social Science Experiments — Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein, 2025 https://scholar.google.com/scholar?q=Finetuning+LLMs+for+Human+Behavior+Prediction+in+Social+Science+Experiments 25. Tuning Language Models for Robust Prediction of Diverse User Behaviors — Fanjin Meng et al., 2025 https://scholar.google.com/scholar?q=Tuning+Language+Models+for+Robust+Prediction+of+Diverse+User+Behaviors 26. AI Post Transformers: RAGEN-2: Reasoning Collapse in Agentic RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-07-ragen-2-reasoning-collapse-in-agentic-rl-3cfa0b.mp3 27. AI Post Transformers: Split Personality Training Reveals Latent Knowledge — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-08-split-personality-training-reveals-laten-c84616.mp3 28. AI Post Transformers: End-to-End Context Compression at Scale — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-10-end-to-end-context-compression-at-scale-278c70.mp3 Interactive Visualization: Simulating Individuals with Self-Reported LLM Agents