LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. -9 H

    “Anthropic and the DoW: Anthropic Responds” by Zvi

    The Department of War gave Anthropic until 5:01pm on Friday the 27th to either give the Pentagon ‘unfettered access’ to Claude for ‘all lawful uses,’ or else. With the ‘or else’ being not the sensible ‘okay we will cancel the contract then’ but also expanding to either being designated a supply chain risk or having the government invoke the Defense Production Act. It is perfectly legitimate for the Department of War to decide that it does not wish to continue on Anthropic's terms, and that it will terminate the contract. There is no reason things need be taken further than that. Undersecretary of State Jeremy Lewin: This isn’t about Anthropic or the specific conditions at issue. It's about the broader premise that technology deeply embedded in our military must be under the exclusive control of our duly elected/appointed leaders. No private company can dictate normative terms of use—which can change and are subject to interpretation—for our most sensitive national security systems. The @DeptofWar obviously can’t trust a system a private company can switch off at any moment. Timothy B. Lee: OK, so don’t renew their contract. Why are you threatening to go nuclear by declaring them [...] --- Outline: (08:00) Good News: We Can Keep Talking (10:31) Once Again No You Do Not Need To Call Dario For Permission (15:22) The Pentagon Reiterates Its Demands And Threats (16:48) The Pentagons Dual Threats Are Contradictory and Incoherent (18:27) The Pentagons Position Has Unfortunate Implications (20:25) OpenAI Stands With Anthropic (22:48) xAI Stands On Unreliable Ground (25:25) Replacing Anthropic Would At Least Take Months (26:02) We Will Not Be Divided (27:50) This Risks Driving Other Companies Away (30:32) Other Reasons For Concern (32:10) Wisdom From A Retired General (35:06) Congress Urges Restraint (37:05) Reaction Is Overwhelmingly With Anthropic On This (40:52) Some Even More Highly Unhelpful Rhetoric (47:23) Other Summaries and Notes (48:32) Paths Forward --- First published: February 27th, 2026 Source: https://www.lesswrong.com/posts/ppj7v4sSCbJjLye3D/anthropic-and-the-dow-anthropic-responds --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    50 min
  2. -12 H

    “Getting Back To It” by sarahconstantin

    Artist: Lily Taylor It's been a while since I’ve written anything lately, and that doesn’t feel good. My writing voice has always been loadbearing to my identity, and if I don’t have anything to say, if I’m not “appearing in public”, it's a little bit destabilizing. Invisibility can be comfortable (and I’m less and less at home with the aggressive side of online discourse these days) but it's also a little bit of a cop-out. The fact is, I’ve been hiding. It feels like “writer's block” or like I “can’t think of anything to say”, but obviously that's suspect, and the real thing is that I can’t think of anything to say that's impeccable and beyond reproach and definitely won’t get criticized. Also, it's clearly a vicious cycle; the less I participate in public life, the fewer discussions I’m part of, and the fewer opportunities I have to riff off of what other people are saying. Life Stuff So what have I been up to? Well, for one thing, I had a baby. This is Bruce. He is very good. For another, I’ve been job hunting. Solo consulting was fun, but I wasn’t getting many clients, and [...] --- Outline: (01:04) Life Stuff (02:27) Projects (03:54) 25. Miscellaneous Opinions --- First published: February 26th, 2026 Source: https://www.lesswrong.com/posts/AYgby4f8EwhABX54q/getting-back-to-it --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    14 min
  3. -13 H

    “New ARENA material: 8 exercise sets on alignment science & interpretability” by CallumMcDougall

    TLDR This is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday). There's a set of exercises (each one contains about 1-2 days of material) on the following topics: Linear Probes (replication of the "Geometry of Truth" paper, plus Apollo's "Probing for Deception" work) Activation Oracles (based around this demo notebook, with additional exercises on model diffing) Attribution graphs (you can build them from scratch here including all the graph pruning implementations, and also use the circuit-tracer library) Emergent Misalignment (mostly based on Soligo & Turner's work; this also covers a lot of "basics of how to work with model organisms" like writing autoraters, using LoRA finetunes, etc) Science of Misalignment (walkthrough of 2 case studies: Palisade's "Shutdown Resistance" & GDM's follow-up, and Alignment Faking) Reasoning Model Interpretability (guided replication of Thought Anchors plus the blackmail extension) LLM Psychology & Persona Vectors (replicates the "assistant axis" paper including activation capping technique, and also has you create a persona vector extraction pipeline) Investigator Agents (basically takes you through building mini-Petri from [...] --- Outline: (00:13) TLDR (01:49) New material (01:52) en-US-AvaMultilingualNeural__ Diagram showing eight AI safety and interpretability concepts including linear probes and activation oracles. (03:19) (1.3.1) Linear Probes (04:06) (1.3.4) Activation Oracles (04:58) (1.4.2) Attribution Graphs (06:15) (4.1) Emergent Misalignment (07:05) (4.2) Science of Misalignment (08:00) (4.3) Reasoning Model Interpretability (08:52) (4.4) LLM Psychology & Persona Vectors (09:51) (4.5) Investigator Agents (10:45) New Site Features (12:07) Logistics (12:57) Why use, in vibe-code world? (15:12) Feedback The original text contained 1 footnote which was omitted from this narration. --- First published: February 27th, 2026 Source: https://www.lesswrong.com/posts/nQAN2vxv2ASjowMda/new-arena-material-8-exercise-sets-on-alignment-science-and --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    16 min
  4. -15 H

    “Strategic nuclear war twice as likely to occur by accident than by AI decisions according to new study” by kromem

    If this headline strikes you as suspicious, you probably have good epistemics about both AI decision making failure rates and the relative likelihood of accidental strategic nuclear war. However, the fact that this is an accurate reporting on a new study that's wildly caught attention across social media should give us pause and warrant a closer look at what's going on, and how what's going on influences not just this headline but all the headlines around this study. I'm referring to the new paper from Kenneth Payne, AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises (2026) which was recently featured in the New Scientist piece "AIs can’t stop recommending nuclear strikes in war game simulations" (perhaps a more accurate a headline than intended). What I'd like to focus in on was Payne's choice in the inclusion, design, and interpretation of his 'accident' mechanic in this study (emphasis added): Finally, we introduced random accidents to simulate the ’fog of war’. With small probability, a model's chosen action is replaced by a more escalatory option, representing miscommunication, unauthorized action, or technical failure. Critically, only the affected player knows the escalation was accidental; their opponent sees only [...] --- Outline: (06:37) Nuclear simulation propagation (08:48) Deescalation is needed --- First published: February 26th, 2026 Source: https://www.lesswrong.com/posts/DwxJpWDoHHvvYupWh/strategic-nuclear-war-twice-as-likely-to-occur-by-accident --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    11 min
  5. -18 H

    “Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda

    Authors: Aditya Singh*, Gerson Kroiz*, Senthooran Rajamanoharan, Neel Nanda Aditya and Gerson are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda. Motivation Imagine that a frontier lab's coding agent has been caught putting a bug in the key code for monitoring what that agent does. Naively, this seems like a clear smoking gun that the agent is scheming. But LLMs often do weird things; they could easily just be confused, or have made a mistake. These all require a response, but the cause and appropriate fix are very different between a scheming and confused model. As such, it is extremely important that we have high-quality methods to be able to incriminate or exonerate a model caught taking sketchy actions, to either build a rigorous case that serious action is needed, or conclude it's a false alarm. Executive Summary A central research question in model incrimination is understanding what motivates a model's actions. To practice this, we investigated several potentially concerning actions from models to uncover the motivations behind them. In particular, we built and then investigated interesting environments where models whistleblew, deceived about sabotaged evals, cheated by taking shortcuts [...] --- Outline: (00:35) Motivation (01:18) Executive Summary (03:48) Environments We Studied (04:41) Funding Email (Behavior: Whistleblowing) (07:35) Evaluation Tampering (Behavior: Deception) (09:08) Secret Number (Behavior: Reward Hacking) (11:05) Math Sandbagging (Behavior: Sandbagging) (12:27) Takeaways For Doing Good Investigations (14:51) Actually Read Your Data (15:24) Verify Hypotheses by Changing the Prompt / Environment (15:47) How to Think about Prompt Counterfactuals (18:29) Examples of how Prompt Counterfactuals are Useful (19:06) Examples of where Prompt Counterfactuals are Confusing (20:06) Corroborate Findings with Several Independent Methods (20:42) Funding Email (Behavior: Whistleblowing) (22:30) Evaluation Tampering (Behavior: Deception) (24:12) Secret Number (Behavior: Reward Hacking) (26:47) Discussion The original text contained 3 footnotes which were omitted from this narration. --- First published: February 27th, 2026 Source: https://www.lesswrong.com/posts/Bv4CLkNzuG6XYTjEe/why-did-my-model-do-that-model-incrimination-for-diagnosing --- Narrated by TYPE III AUDIO.

    30 min
  6. -1 J

    “Here’s to the Polypropylene Makers” by jefftk

    Six years ago, as covid-19 was rapidly spreading through the US, my sister was working as a medical resident. One day she was handed an N95 and told to "guard it with her life", because there weren't any more coming. N95s are made from meltblown polypropylene, produced from plastic pellets manufactured in a small number of chemical plants. Building more would take too long: we needed these plants producing all the pellets they could. Braskem America operated plants in Marcus Hook PA and Neal WV. If there were infections on-site, the whole operation would need to shut down, and the factories that turned their pellets into mask fabric would stall. Companies everywhere were figuring out how to deal with this risk. The standard approach was staggering shifts, social distancing, temperature checks, and lots of handwashing. This reduced risk, but it was still significant: each shift change was an opportunity for someone to bring an infection from the community into the factory. I don't know who had the idea, but someone said: what if we never left? About eighty people, across both plants, volunteered to move in. The plan was four weeks, twelve-hour [...] --- First published: February 27th, 2026 Source: https://www.lesswrong.com/posts/HQTueNS4mLaGy3BBL/here-s-to-the-polypropylene-makers --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    4 min
  7. -1 J

    “Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”” by Matrice Jacobine

    I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries. Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government's classified networks, the first to deploy them at the National Laboratories, and the first to provide custom models for national security customers. Claude is extensively deployed across the Department of War and other national security agencies for mission-critical applications, such as intelligence analysis, modeling and simulation, operational planning, cyber operations, and more. Anthropic has also acted to defend America's lead in AI, even when it is against the company's short-term interest. We chose to forgo several hundred million dollars in revenue to cut off the use of Claude by firms linked to the Chinese Communist Party (some of whom have been designated by the Department of War as Chinese Military Companies), shut down CCP-sponsored cyberattacks that attempted to abuse Claude, and have advocated for strong export controls on chips to ensure a democratic advantage. Anthropic understands that the Department of War, not [...] --- First published: February 26th, 2026 Source: https://www.lesswrong.com/posts/d5Lqf8nSxm6RpmmnA/anthropic-statement-from-dario-amodei-on-our-discussions --- Narrated by TYPE III AUDIO.

    6 min

À propos

Audio narrations of LessWrong posts.

Vous aimeriez peut‑être aussi