LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 9H AGO

    “Increasing marginal returns to effort are common” by habryka

    Context: Every Sunday I write a mini-essay about an operating principle of Lightcone Infrastructure that I want to remind my team about. This is post #5 in the sequence of these essays, lightly edited and expanded upon in more canonical form. Most things have diminishing marginal returns. I often repeat the Pareto Principle to others: "You can get 80% of the benefit here with the right 20% of the cost", which is a particularly extreme case of diminishing marginal returns. But I think for much of the work that Lightcone does, the returns to effort are generally increasing, not decreasing. To explain, let's start with the simplest toy case of a situation in which trying harder at something gets more valuable the more you are already trying: A winner-takes-all competition. If you are in a competition where the top performer takes all winnings, then doing half as well as the other contestants predictably gets you 0% of the value. Indeed, inasmuch as you are racing against identical candidates that put in 99% of the possible effort, and your performance is a direct result of the effort you put in, all the value is generated by you going from 98% [...] --- First published: November 15th, 2025 Source: https://www.lesswrong.com/posts/swymiotpbYFv9pnEk/increasing-marginal-returns-to-effort-are-common --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    13 min
  2. 13H AGO

    “‘But You’d Like To Feel Companionate Love, Right? ... Right?’” by johnswentworth

    One of the responses which one will predictably receive when posting something titled “How I Learned That I Don't Feel Companionate Love” is “... but you’d choose to feel it if you could, right?”. Look man, your most treasured values are just… not actually that universal or convergent. I’m not saying that you should downgrade the importance of love to you. I am saying that an awful lot of people seem really desperate to find some story for why their most precious values are the True Convergent Goodness or some such. And love sure is an especially precious value generator for an awful lot of people, so people really want to find some reason why even a person who felt no companionate love would at least want to feel it. Empirically, that is just not what happens. If I had a button which could magically turn my oxytocin receptors to the usual kind, rather than their current probably-dysfunctional state, I would view that button in basically the same way I view a syringe of heroin. It might be interesting as an experiment, just to see what it's like. But it sure seems pretty common for heroin to give people [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 15th, 2025 Source: https://www.lesswrong.com/posts/TcJpm3mcLMoKMzqpj/but-you-d-like-to-feel-companionate-love-right-right --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    5 min
  3. 17H AGO

    “Understanding and Controlling LLM Generalization” by Daniel Tan

    A distillation of my long-term research agenda and current thinking. I welcome takes on this. Why study generalization?  I'm interested in studying how LLMs generalise - when presented with multiple policies that achieve similar loss, which ones tend to be learned by default? I claim this is pretty important for AI safety: Re: developing safe general intelligence, we will never be able to train LLM on all the contexts it will see at deployment. To prevent goal misgeneralization, it's necessary to understand how LLMs generalise their training OOD. Re: loss of control risks specifically, certain important kinds of misalignment (reward hacking, scheming) are difficult to 'select against' at the behavioural level. A fallback for this would be if LLMs had an innate 'generalization propensity' to learn aligned policies over misaligned ones.  This motivates research into LLM inductive biases. Or as I'll call them from here on, 'generalization propensities'. I have two high-level goals: Understanding the complete set of causal factors that drive generalization. Controlling generalization by intervening on these causal factors in a principled way.  Defining "generalization propensity"  To study generalization propensities, we need two things: "Generalization propensity evaluations" (GPEs) [...] --- Outline: (00:18) Why study generalization? (01:30) Defining generalization propensity (02:29) Research questions --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/ZSQaT2yxNNZ3eLxRd/understanding-and-controlling-llm-generalization --- Narrated by TYPE III AUDIO.

    4 min
  4. 18H AGO

    “AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o” by Zvi

    GPT-4o has been a unique problem for a while, and has been at the center of the bulk of mental health incidents involving LLMs that didn’t involve character chatbots. I’ve previously covered related issues in AI Craziness Mitigation Efforts, AI Craziness Notes, GPT-4o Responds to Negative Feedback, GPT-4o Sycophancy Post Mortem and GPT-4o Is An Absurd Sycophant. Discussions of suicides linked to AI previously appeared in AI #87, AI #134, AI #131 Part 1 and AI #122. The Latest Cases Look Quite Bad For OpenAI I’ve consistently said that I don’t think it's necessary or even clearly good for LLMs to always adhere to standard ‘best practices’ defensive behaviors, especially reporting on the user, when dealing with depression, self-harm and suicidality. Nor do I think we should hold them to the standard of ‘do all of the maximally useful things.’ Near: while the llm response is indeed really bad/reckless its worth keeping in mind that baseline suicide rate just in the US is ~50,000 people a year; if anything i am surprised there aren’t many more cases of this publicly by now I do think it's fair to insist they never actively encourage suicidal behaviors. [...] --- Outline: (00:47) The Latest Cases Look Quite Bad For OpenAI (02:34) Routing Sensitive Messages Is A Dominated Defensive Strategy (03:28) Some 4o Users Get Rather Attached To The Model (05:04) A Theory Of How All This Works (07:38) Maybe This Is Net Good In Spite Of Everything? (11:10) Could One Make A 'Good 4o'? --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/erTE9BTM7gGHb96po/ai-craziness-additional-suicide-lawsuits-and-the-fate-of-gpt --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    13 min
  5. 1D AGO

    “AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

    Is focusing on corrigibility our best shot at getting to ASI alignment? Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST). Max thinks focusing on corrigibility is the most plausible path to build ASI without losing control and dying, while Jeremy is skeptical that attempting CAST would lead to better superintelligent AI behavior on a sufficiently early try. We recorded a friendly debate to understand the crux of Max and Jeremy's disagreement. The conversation also doubles as a way to learn about Max's Corrigibility As Singular Target proposal. Video Podcast Listen on Spotify, import the RSS feed, or search "Doom Debates" in your podcast player. Plus: Max's New Book, Red Heart Max just published Red Heart, a realistic sci-fi thriller that brings the corrigibility problem to life through a high-stakes Chinese government AI project. I thoroughly enjoyed reading it and highly recommend it! The last 20 minutes of my conversation with Max are all about Red Heart. Transcript Episode Preview Max Harms 00:00:00 If you mess up real bad, this thing goes and eats [...] --- Outline: (00:14) Is focusing on corrigibility our best shot at getting to ASI alignment? (01:08) Video (01:14) Podcast (01:24) Plus: Maxs New Book, Red Heart (01:55) Transcript (01:58) Episode Preview (13:32) Why Corrigibility Matters (15:20) What's Your P(Doom)™ (20:42) Max's Case for Corrigibility (23:46) Jeremy's Case Against Corrigibility (26:21) Max's Mainline AI Scenario (32:44) 4. Strategies: Alignment, Control, Corrigibility, Don't Build It (41:53) Corrigibility vs HHH (Helpful, Harmless, Honest) (47:31) Asimov's 3 Laws of Robotics (52:45) Is Corrigibility a Coherent Concept? (01:03:21) Corrigibility vs Shutdown-ability (01:09:21) CAST: Corrigibility as Singular Target, Near Misses, Iterations (01:19:20) Debating if Max is Over-Optimistic (01:32:46) Debating if Corrigibility is the Best Target (01:39:58) Would Max Work for Anthropic? (01:42:37) Max's Modest Hopes (02:02:35) Max's New Book: Red Heart (02:21:52) Outro --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/CsXAg8dHSgghDAoPx/ai-corrigibility-debate-max-harms-vs-jeremy-gillen --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    2h 25m
  6. 1D AGO

    “10” by Ben Pace

    Several artists and professionals have come to Inkhaven to share their advice. They keep talking about form—even if you have a raw feeling or interest, you must channel it through one of the forms of art or journalism to make it into something people can appreciate. I'm curious to understand why they care so much; and because it seems interesting, I'd like to explore form. As a first step, I have written down some of the types of blogpost that are written on LessWrong. 1. Concept Handles Concept-handles are the most common way that the discourse is attempted to be moved forwards, through helping us notice phenomena and orient to them. The thing about a Concept-Handle-Post is that, as long as the post gives you a good hook for the idea, the rest of the post can be pretty crazy. It can take a niche position as though it's common place, it can have fiction in it, it can be crazily long and people forget 90% of it, it can be just a few paragraphs. Some Examples In rationality is can be about individual rationality ("Reason as Memetic Immune Disorder", "Schelling Fences on Slippery Slopes") or group [...] --- Outline: (00:36) 1. Concept Handles (02:13) 2. One-Shot Fiction (02:45) 3. Grand theory (with lots of research) (03:30) 4. Book Reviews (03:59) 5. Introductions / Primers (04:19) 6. Frameworks (04:53) 7. Post about the Rationalists (05:15) 8. Disagreeing with Eliezer Yudkowsky (06:10) 9. AI Futurism (06:29) 10. Curiosities --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/Gi3v6j5QiiFku39s6/10-types-of-lesswrong-post --- Narrated by TYPE III AUDIO.

    8 min
  7. 1D AGO

    “Everyone has a plan until they get lied to the face” by Screwtape

    "Everyone has a plan until they get punched in the face." - Mike Tyson (The exact phrasing of that quote changes, this is my favourite.) I think there is an open, important weakness in many people. We assume those we communicate with are basically trustworthy. Further, I think there is an important flaw in the current rationality community. We spend a lot of time focusing on subtle epistemic mistakes, teasing apart flaws in methodology and practicing the principle of charity. This creates a vulnerability to someone willing to just say outright false things. We’re kinda slow about reacting to that. Suggested reading: Might People on the Internet Sometimes Lie, People Will Sometimes Just Lie About You. Epistemic status: My Best Guess. I. Getting punched in the face is an odd experience. I'm not sure I recommend it, but people have done weirder things in the name of experiencing novel psychological states. If it happens in a somewhat safety-negligent sparring ring, or if you and a buddy go out in the back yard tomorrow night to try it, I expect the punch gets pulled and it's still weird. There's a jerk of motion your eyes try to catch up [...] --- Outline: (01:03) I. (03:30) II. (07:33) III. (09:55) 4. The original text contained 1 footnote which was omitted from this narration. --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/5LFjo6TBorkrgFGqN/everyone-has-a-plan-until-they-get-lied-to-the-face --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    13 min

About

Audio narrations of LessWrong posts.

You Might Also Like