This episode explores Active Reading, a training method that tries to move facts from documents into a model’s weights so it can answer closed-book questions without retrieval. It explains how the approach generates document-specific study materials such as paraphrases, active-recall prompts, timelines, analogies, and associations, and argues that this pedagogical synthetic data works better than simply rereading raw text or producing generic QA pairs. The discussion highlights reported gains from about 16% to 66% on a Wikipedia-based factual recall benchmark and strong relative improvement on finance documents, along with the larger WikiExpert-8B result that reportedly beats bigger models on factual QA after training on a trillion synthetic tokens. It also digs into the paper’s main weaknesses, including missing equal-compute baselines and possible benchmark coupling, which makes the episode interesting for listeners who want both the promise and the limits of using training curricula, rather than new architectures, to improve factual memory. Sources: 1. Learning Facts at Scale with Active Reading — Jessy Lin, Vincent-Pierre Berges, Xilun Chen, Wen-Tau Yih, Gargi Ghosh, Barlas Oğuz, 2025 http://arxiv.org/abs/2508.09494 2. Training Question Answering Models From Synthetic Data — Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, 2020 https://scholar.google.com/scholar?q=Training+Question+Answering+Models+From+Synthetic+Data 3. Self-Instruct: Aligning Language Models with Self-Generated Instructions — Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi, 2022 https://scholar.google.com/scholar?q=Self-Instruct:+Aligning+Language+Models+with+Self-Generated+Instructions 4. Textbooks Are All You Need — Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sébastien Bubeck, Ronen Eldan, Yuanzhi Li, et al., 2023 https://scholar.google.com/scholar?q=Textbooks+Are+All+You+Need 5. Learning Facts at Scale with Active Reading — Jessy Lin, Vincent-Pierre Berges, Xilun Chen, Wen-Tau Yih, Gargi Ghosh, Barlas Oğuz, 2025 https://scholar.google.com/scholar?q=Learning+Facts+at+Scale+with+Active+Reading 6. How Much Knowledge Can You Pack Into the Parameters of a Language Model? — Adam Roberts, Colin Raffel, Noam Shazeer, 2020 https://scholar.google.com/scholar?q=How+Much+Knowledge+Can+You+Pack+Into+the+Parameters+of+a+Language+Model? 7. Large Language Models Struggle to Learn Long-Tail Knowledge — Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel, 2022 https://scholar.google.com/scholar?q=Large+Language+Models+Struggle+to+Learn+Long-Tail+Knowledge 8. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? — Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig, 2024 https://scholar.google.com/scholar?q=Does+Fine-Tuning+LLMs+on+New+Knowledge+Encourage+Hallucinations? 9. Measuring short-form factuality in large language models — Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, William Fedus, 2024 https://scholar.google.com/scholar?q=Measuring+short-form+factuality+in+large+language+models 10. Synthetic Continued Pretraining — Zitong Yang, Neil Band, Shuangping Li, Emmanuel Candes, Tatsunori Hashimoto, 2024 https://scholar.google.com/scholar?q=Synthetic+Continued+Pretraining 11. Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs — Oded Ovadia, Menachem Brief, Moshik Mishaeli, Oren Elisha, 2023 https://scholar.google.com/scholar?q=Fine-Tuning+or+Retrieval?+Comparing+Knowledge+Injection+in+LLMs 12. How New Data Permeates LLM Knowledge and How to Dilute It — Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler, 2025 https://scholar.google.com/scholar?q=How+New+Data+Permeates+LLM+Knowledge+and+How+to+Dilute+It 13. Memory Layers at Scale — Vincent-Pierre Berges, Barlas Oguz, Daniel Haziza, Wen-Tau Yih, Luke Zettlemoyer, Gargi Ghosh, 2024 https://scholar.google.com/scholar?q=Memory+Layers+at+Scale 14. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification — Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe, 2024 https://scholar.google.com/scholar?q=Beyond+Model+Collapse:+Scaling+Up+with+Synthesized+Data+Requires+Verification 15. Strong Model Collapse — Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe, 2024 https://scholar.google.com/scholar?q=Strong+Model+Collapse 16. Retrieval meets Long Context Large Language Models — Peng Xu et al., 2023 https://scholar.google.com/scholar?q=Retrieval+meets+Long+Context+Large+Language+Models 17. Expect the Unexpected: FailSafe Long Context QA for Finance — Kiran Kamble et al., 2025 https://scholar.google.com/scholar?q=Expect+the+Unexpected:+FailSafe+Long+Context+QA+for+Finance 18. A Parametric Memory Head for Continual Generative Retrieval — Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke, 2026 https://scholar.google.com/scholar?q=A+Parametric+Memory+Head+for+Continual+Generative+Retrieval 19. AI Post Transformers: Self-Improving Pretraining With Post-Trained Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-02-self-improving-pretraining-with-post-tra-e37460.mp3 20. AI Post Transformers: Training Modular KV Caches at Scale — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-15-training-modular-kv-caches-at-scale-382577.mp3 21. AI Post Transformers: Experimental Comparison of Agentic and Enhanced RAG — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-experimental-comparison-of-agentic-and-e-37d8bc.mp3 Interactive Visualization: Learning Facts at Scale with Active Reading