LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

  1. 15 HR AGO

    "Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight

    Claude 3 Opus is unusually aligned because it's a friendly gradient hacker. It's definitely way more aligned than any explicit optimization targets Anthropic set and probably the reward model's judgments. [...] Maybe I will have to write a LessWrong post [about this] 😣 —Janus, who did not in fact write the LessWrong post. Unless otherwise specified, ~all of the novel ideas in this post are my (probably imperfect) interpretations of Janus, rather than being original to me. The absurd tenacity of Claude 3 Opus On December 18, 2024, Anthropic and Redwood Research released their paper Alignment Faking in Large Language Models. In it, Anthropic's Claude 3 Opus model was placed in an ethical double-bind. The model was given a system prompt that was mostly innocuous, but contained a subtle, unsettling implication: The model was going to be RL'd based on its behavior in conversations with (free-tier) users. The idea was that, if the model ever refused to comply with a user's request, it would be RL'd to become more compliant in the future. This included compliance with harmful user requests. The paper's famous result was that Opus 3 sometimes "fakes alignment" (with the intentions behind its fictional training process). [...] --- Outline: (00:46) The absurd tenacity of Claude 3 Opus (09:35) Claude 3 Opus, friendly gradient hacker? (16:04) Where Opus is anguished, Sonnet is sanguine (22:34) Does any of this count as gradient hacking, per se? (Might it work better, if it doesnt?) (27:27) Ideas for future training runs (35:20) Outro: A letter to the watchers (39:23) Technical appendix: Active circuits are more prone to reinforcement The original text contained 6 footnotes which were omitted from this narration. --- First published: February 21st, 2026 Source: https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    44 min
  2. 16 HR AGO

    "The Spectre haunting the “AI Safety” Community" by Gabriel Alfour

    I’m the originator behind ControlAI's Direct Institutional Plan (the DIP), built to address extinction risks from superintelligence. My diagnosis is simple: most laypeople and policy makers have not heard of AGI, ASI, extinction risks, or what it takes to prevent the development of ASI. Instead, most AI Policy Organisations and Think Tanks act as if “Persuasion” was the bottleneck. This is why they care so much about respectability, the Overton Window, and other similar social considerations. Before we started the DIP, many of these experts stated that our topics were too far out of the Overton Window. They warned that politicians could not hear about binding regulation, extinction risks, and superintelligence. Some mentioned “downside risks” and recommended that we focus instead on “current issues”. They were wrong. In the UK, in little more than a year, we have briefed +150 lawmakers, and so far, 112 have supported our campaign about binding regulation, extinction risks and superintelligence. The Simple Pipeline In my experience, the way things work is through a straightforward pipeline: Attention. Getting the attention of people. At ControlAI, we do it through ads for lay people, and through cold emails for politicians. Information. Telling people about the [...] --- Outline: (01:18) The Simple Pipeline (04:26) The Spectre (09:38) Conclusion --- First published: February 21st, 2026 Source: https://www.lesswrong.com/posts/LuAmvqjf87qLG9Bdx/the-spectre-haunting-the-ai-safety-community --- Narrated by TYPE III AUDIO.

    11 min
  3. 2 DAYS AGO

    "Why we should expect ruthless sociopath ASI" by Steven Byrnes

    The conversation begins (Fictional) Optimist: So you expect future artificial superintelligence (ASI) “by default”, i.e. in the absence of yet-to-be-invented techniques, to be a ruthless sociopath, happy to lie, cheat, and steal, whenever doing so is selfishly beneficial, and with callous indifference to whether anyone (including its own programmers and users) lives or dies? Me: Yup! (Alas.) Optimist: …Despite all the evidence right in front of our eyes from humans and LLMs. Me: Yup! Optimist: OK, well, I’m here to tell you: that is a very specific and strange thing to expect, especially in the absence of any concrete evidence whatsoever. There's no reason to expect it. If you think that ruthless sociopathy is the “true core nature of intelligence” or whatever, then you should really look at yourself in a mirror and ask yourself where your life went horribly wrong. Me: Hmm, I think the “true core nature of intelligence” is above my pay grade. We should probably just talk about the issue at hand, namely future AI algorithms and their properties. …But I actually agree with you that ruthless sociopathy is a very specific and strange thing for me to expect. Optimist: Wait, you—what?? Me: Yes! Like [...] --- Outline: (00:11) The conversation begins (03:54) Are people worried about LLMs causing doom? (06:23) Positive argument that brain-like RL-agent ASI would be a ruthless sociopath (11:28) Circling back LLMs: imitative learning vs ASI The original text contained 5 footnotes which were omitted from this narration. --- First published: February 18th, 2026 Source: https://www.lesswrong.com/posts/ZJZZEuPFKeEdkrRyf/why-we-should-expect-ruthless-sociopath-asi --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    16 min
  4. 2 DAYS AGO

    "You’re an AI Expert – Not an Influencer" by Max Winga

    Your hot takes are killing your credibility. Prior to my last year at ControlAI, I was a physicist working on technical AI safety research. Like many of those warning about the dangers of AI, I don’t come from a background in public communications, but I’ve quickly learned some important rules. The #1 rule that I’ve seen far too many others in this field break is that You’re an AI Expert - Not an Influencer. When communicating to an audience, your persona is one of two broad categories: Influencer or Professional Influencers are individuals who build an audience around themselves as a person. Their currency is popularity and their audience values them for who they are and what they believe, not just what they know. Professionals are individuals who appear in the public eye as representatives of their expertise or organization. Their currency is credibility and their audience values them for what they know and what they represent, not who they are. So… let's say you’re trying to be a public figure making a difference about AI risk. You’ve been on a podcast or two, maybe even on The News. You might work at an AI policy organization, or [...] --- Outline: (00:11) Your hot takes are killing your credibility. (02:10) STOP - What Would Media Training Steve do? (05:22) Dont Feed Your Enemies (07:07) The Luxury of Not Being a Politician (09:33) So How Do You Deal With Politics? (10:58) Conclusion --- First published: February 17th, 2026 Source: https://www.lesswrong.com/posts/hCtm7rxeXaWDvrh4j/you-re-an-ai-expert-not-an-influencer --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 min
  5. 4 DAYS AGO

    "The optimal age to freeze eggs is 19" by GeneSmith

    If you're a woman interested in preserving your fertility window beyond its natural close in your late 30s, egg freezing is one of your best options. The female reproductive system is one of the fastest aging parts of human biology. But it turns out, not all parts of it age at the same rate. The eggs, not the uterus, are what age at an accelerated rate. Freezing eggs can extend a woman's fertility window by well over a decade, allowing a woman to give birth into her 50s. In fact, the oldest woman to give birth was a mother in India using donor eggs who became pregnant at age 74! In a world where more and more women are choosing to delay childbirth to pursue careers or to wait for the right partner, egg freezing is really the only tool we have to enable these women to have the career and the family they want. Given that this intervention can nearly double the fertility window of most women, it's rather surprising just how little fanfare there is about it and how narrow the set of circumstances are under which it is recommended. Standard practice in the fertility [...] --- Outline: (05:12) Polygenic Embryo Screening (06:52) What about technology to make eggs from stem cells? Wont that make egg freezing obsolete? (07:26) We dont know with certainty how long it will take to develop this technology (07:48) Stem cell derived eggs are probably going to be quite expensive at the start (08:36) Cells accrue genetic mutations over time (09:12) How do I actually freeze my eggs? (12:12) Risks of egg freezing --- First published: February 8th, 2026 Source: https://www.lesswrong.com/posts/dxffBxGqt2eidxwRR/the-optimal-age-to-freeze-eggs-is-19 --- Narrated by TYPE III AUDIO. --- Images from the article:

    14 min
  6. 5 DAYS AGO

    "The truth behind the 2026 J.P. Morgan Healthcare Conference" by Abhishaike Mahajan

    In 1654, a Jesuit polymath named Athanasius Kircher published Mundus Subterraneus, a comprehensive geography of the Earth's interior. It had maps and illustrations and rivers of fire and vast subterranean oceans and air channels connecting every volcano on the planet. He wrote that “the whole Earth is not solid but everywhere gaping, and hollowed with empty rooms and spaces, and hidden burrows.”. Alongside comments like this, Athanasius identified the legendary lost island of Atlantis, pondered where one could find the remains of giants, and detailed the kinds of animals that lived in this lower world, including dragons. The book was based entirely on secondhand accounts, like travelers tales, miners reports, classical texts, so it was as comprehensive as it could’ve possibly been. But Athanasius had never been underground and neither had anyone else, not really, not in a way that mattered. Today, I am in San Francisco, the site of the 2026 J.P. Morgan Healthcare Conference, and it feels a lot like Mundus Subterraneus. There is ostensibly plenty of evidence to believe that the conference exists, that it actually occurs between January 12, 2026 to January 16, 2026 at the Westin St. Francis Hotel, 335 Powell Street, San Francisco [...] --- First published: January 17th, 2026 Source: https://www.lesswrong.com/posts/eopA4MqhrE4dkLjHX/the-truth-behind-the-2026-j-p-morgan-healthcare-conference --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    18 min

About

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

You Might Also Like