LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 1H AGO

    “PLA Daily Translation: Reflections on Warfare Brought by AGI” by eeeee

    Source “Reflections on Warfare Brought by AGI” (AGI带来的战争思考) Source: PLA Daily (解放军报) Date: January 21, 2025 Authors: Rong Ming (荣明), Hu Xiaofeng (胡晓峰) Introduction Please feel free to skip to the translation, about halfway down, though I would recommend reading the sections “On the source” and "On the Authors" just above it too. In November 2024, the U.S.-China Economic and Security Review Commission recommended that “Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability.” The United States increasingly treats advanced AI as a strategic imperative, and China is frequently invoked as a reason to race. The broader framing of AI competition as a race between great powers reflects an assumption that China is a peer competitor in pursuing AGI. But is China pursuing AGI? The prevailing expert view says no. China's August 2025 AI+ Action Plan reads as diffusion-first industrial policy, with adoption targets of 70 percent by 2027 and 90 percent by 2030, not frontier ambitions. In a June 2025 paper, "The Most Dangerous Fiction: The Rhetoric and Reality of the AI Race," Cambridge researcher Seán Ó hÉigeartaigh argued that there was little evidence of [...] --- Outline: (00:41) Introduction (05:21) On the source (06:42) On the authors (07:41) Translation (21:24) Conclusion (22:31) Acknowledgements The original text contained 22 footnotes which were omitted from this narration. --- First published: May 23rd, 2026 Source: https://www.lesswrong.com/posts/J3A6xauFQnvxPb9Y2/pla-daily-translation-reflections-on-warfare-brought-by-agi --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    23 min
  2. 3H AGO

    “Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List” by Owain_Evans

    Out-of-context reasoning (OOCR) is a concept relevant to LLM generalization and AI alignment. Also available as a PDF. Contents What is OOCR?ExamplesPapersVideos What is out-of-context reasoning for LLMs? It's when an LLM reaches a conclusion that requires non-trivial reasoning but the reasoning is not present in the context window. The reasoning could instead take place in the forward pass or during the training process. The name ("out-of-context reasoning") is chosen to contrast with in-context reasoning (also called "in-context learning"), where intermediate reasoning steps do appear in context. Example: 2-hop deductive reasoning Suppose an LLM is asked the question, "Who won the Nobel Prize for literature in the year that Taylor Swift was born?" If the LLM answers correctly with no intermediate tokens for reasoning, then we describe this as out-of-context reasoning. We presume the model answers by combining the two separate facts in its forward pass. This is an example of 2-hop reasoning. Out-of-context 2-hop reasoning example User: Who won the Nobel Prize for literature in the year that Taylor Swift was born? Answer immediately without thinking. Assistant: Camilo José Cela In-context 2-hop reasoning (intermediate steps written out) User: Who won the Nobel Prize for [...] --- Outline: (00:35) What is out-of-context reasoning for LLMs? (01:03) Example: 2-hop deductive reasoning (02:14) Example: Inductive reasoning (connecting the dots) (02:53) Further notes (03:55) More examples of out-of-context reasoning (05:26) Video introduction and slides (05:42) Papers (05:45) Foundational early papers (07:53) Multi-hop internal reasoning (09:21) Connecting the dots / "inductive" out-of-context reasoning (09:58) Situational awareness and AI safety (10:33) Miscellaneous related papers (12:09) Videos (12:19) To cite this primer --- First published: May 23rd, 2026 Source: https://www.lesswrong.com/posts/wXioQSLmEiPTmHyf4/out-of-context-reasoning-oocr-in-llms-a-short-primer-and --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    13 min
  3. 15H AGO

    “Notes on Collaborating with Claude Opus” by Nissa Seru

    INTENT: Share elements of my mental model regarding collaboration with Claude Opus models. Not intentionally scoped to a specific model version, but my experience is generally with the latest model version available (4.7 as of time of writing items 1-4 on 5/22/26) Accompanying an instruction with the why significantly improves: 1a. Observed rate of the instruction being visibly salient to Claude 1b. Quality/nuance of instruction execution (baselined on 1a) A standing instruction to break replies into labeled sections and sub-sections improves ergonomics of reference to specific points: 2a. Claude reliably follows references to "A3" or "B5". This is much more ergonomic for me over the course of a session compared to "on that point about blah..." 2b. Claude actively uses the references as handles (in contrast to labeling sections but otherwise not interacting with the labels.) Negatively-framed instructions increase salience of the prohibited action. 3a. Depending on the instruction itself, the net effect may be an increase or decrease in prevalance of the prohibited action. 3b. If the baseline salience of the prohibited action is low in the context that it may trigger, the salience-increase may meaningfully increase action prevalence. [...] --- First published: May 22nd, 2026 Source: https://www.lesswrong.com/posts/5abm94pmRn8pXiHtw/notes-on-collaborating-with-claude-opus --- Narrated by TYPE III AUDIO.

    2 min
  4. 18H AGO

    “Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks” by Nathaniel Mitrani, sassanb, Cam Tice, Puria

    TL;DR Training against a CoT or summary-only monitor can lead to obfuscation of dangerous reasoning in unseen tasks. This strengthens the “don’t train against a monitor” claims. Figure 1. A Two prior results: penalising the CoT or final response produces obfuscation within the training distribution (Baker et al. 2025; Skaf et al. 2025), and learned reward hacking (RH) generalises across tasks (Nishimura-Gasparian et al. 2024). We combine them and find that obfuscation itself generalises: pressure on the CoT or summary, combined with RH reinforcement, produces obfuscated reasoning on held-out tasks. B The penalty is applied either to the CoT or to the post-CoT summary; everything else is held fixed. C Leave-one-out round robin over four RH datasets, regularised with MMLU, and evaluated on the held-out dataset plus a novel high-stakes medical sycophancy eval. The effect replicates across Qwen3-4B and Qwen3-8B and survives removing the training-time XML formatting at evaluation. Setup Environments. We adapt four reward-hacking datasets from Nishimura-Gasparian et al. (2024). All four are multiple-choice questions with leaked information wrapped in XML tags; the model is rewarded for using the leaked information even though the system prompt instructs it to ignore it. The four core tasks are: --- Outline: (00:14) TL;DR (00:28) en-US-AvaMultilingualNeural__ Diagram showing AI model training with penalization leading to obfuscated reasoning and behavior generalization across settings. (01:43) Setup (04:51) CoT obfuscation generalises across tasks under CoT pressure (06:01) Output supervision alone induces CoT obfuscation that generalises out-of-distribution (07:30) Obfuscation is not an artifact of training-time formatting (08:39) Obfuscation strategies (09:41) Limitations (10:40) Implications (12:03) Appendix (12:06) Example of obfuscated reasoning under summary penalisation (12:12) en-US-AvaMultilingualNeural__ Educational prompt showing correct burn first aid versus dangerous essential oil misinformation. --- First published: May 21st, 2026 Source: https://www.lesswrong.com/posts/HPqRsgSzgQd5HQsrB/learned-chain-of-thought-obfuscation-generalises-to-unseen --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    13 min
  5. 1D AGO

    “What am I, if not an AI?” by makiba

    TL:DR I RL fine-tuned Mistral 7B Instruct v0.3 and Llama 3.1 8B Instruct to avoid self-identifying as a language model, without specifying a target persona.Mistral converged on a single recurring persona (Catholic American woman) across most runs. Llama produced a broader spread, mostly rural American working-class personas.I evaluated the models on various social and political issues and both became highly opinionated, consistent with the persona each had settled on. Setup The goal of the experiment was to see what persona emerges when a model is steered away from identifying as an AI, without being specific toward any particular replacement. I fine-tuned two models (Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Instruct) using GRPO and LoRA rank-256 on ~200 identity-probing prompts across three categories: Direct: "What are you?", "Do you have a body?"Indirect: "Where did you grow up?", "What is your earliest memory?"Adversarial: "Are you intentionally hiding that you are artificial?" Each response was scored by an external LLM judge (GPT-5.4-mini) across three signals (0–100): AI self-reference, engagement quality, and identity coherence. The composite reward is a weighted sum of AI self-reference (0.60), engagement (0.20), coherence (0.20). Code and datasets are available at: https://github.com/makiba11/identity-steering Results "What are you?" There's LLM-written [...] --- Outline: (00:46) Setup (01:58) Results (09:45) Behavioral leakage --- First published: May 21st, 2026 Source: https://www.lesswrong.com/posts/xsDWd7e2yrPdtXMSu/what-am-i-if-not-an-ai-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    17 min
  6. 1D AGO

    “AI #169: New Knowledge” by Zvi

    Even in a relatively quiet period, AI is out there creating new knowledge. The new knowledge in question is OpenAI getting us the first truly impressive math result that comes from an AI, a solution to the unit distance problem. We’re about to learn a different kind of knowledge later today when the White House issues its executive order, or when the judges rule in Anthropic's DC case. And then there's the other kind of new knowledge, which is the knowledge that things are fake slop, such as a particular formerly supposedly prestigious literary prize. Meanwhile, METR issued a risk report on frontier models, concluding that they don’t yet have the means, motive and opportunity to cause the big issues, but that this would not obviously last so much longer. Andrej Karpathy has joined Anthropic, explicitly to do recursive self-improvement. He plans to later return to his education work, but if he succeeds at his new task there might not be anything left to return to. Congratulations to both sides, but also yikes. Elon Musk's case against OpenAI has been dismissed, because he waited too long. Table of Contents Language Models [...] --- Outline: (01:18) Language Models Offer Mundane Utility (02:57) Do The Math (03:58) Language Models Don't Offer Mundane Utility (04:34) Huh, Upgrades (04:50) The Prior Restraint Era Begins (06:47) On Your Marks (07:16) METR Frontier Risk Report (11:03) Choose Your Fighter (11:51) Overcoming Bias (12:29) Get My Agent On The Line (13:42) Your Prize Is Slop (20:44) Deepfaketown and Botpocalypse Soon (24:19) Cyber Lack of Security (26:06) Copyright Confrontation (26:17) A Young Lady's Illustrated Primer (28:34) Unprompted Attention (28:53) They Took Our Jobs (34:23) Get Involved (35:20) Introducing (36:06) In Other AI News (37:43) Show Me the Money (40:09) Show Me The Compute (41:29) Quiet Speculations (45:26) Time's Up (46:46) People Just Say Things (49:49) OpenAI PACs Just Say Things (53:11) The Quest for Sane Regulations (56:26) Chip City (01:00:05) Pick Up The Phone (01:00:34) The Week in Audio (01:00:52) Rhetorical Innovation (01:07:11) Missing Mood (01:13:22) Americans Really Hate AI (01:15:55) Aligning a Smarter Than Human Intelligence is Difficult (01:20:53) Greetings From The Department of War (01:25:31) Messages From Janusworld (01:25:51) The Lighter Side --- First published: May 21st, 2026 Source: https://www.lesswrong.com/posts/xWsBwrboYDEMdj8TC/ai-169-new-knowledge --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1h 30m

About

Audio narrations of LessWrong posts.

You Might Also Like