LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 6H AGO

    “In (highly contingent!) defense of interpretability-in-the-loop ML training” by Steven Byrnes

    Let's call “interpretability-in-the-loop training” the idea of running a learning algorithm that involves an inscrutable trained model, and there's some kind of interpretability system feeding into the loss function / reward function. Interpretability-in-the-loop training has a very bad rap (and rightly so). Here's Yudkowsky 2022: When you explicitly optimize against a detector of unaligned thoughts, you're partially optimizing for more aligned thoughts, and partially optimizing for unaligned thoughts that are harder to detect. Optimizing against an interpreted thought optimizes against interpretability. Or Zvi 2025: The Most Forbidden Technique is training an AI using interpretability techniques. An AI produces a final output [X] via some method [M]. You can analyze [M] using technique [T], to learn what the AI is up to. You could train on that. Never do that. You train on [X]. Only [X]. Never [M], never [T]. Why? Because [T] is how you figure out when the model is misbehaving. If you train on [T], you are training the AI to obfuscate its thinking, and defeat [T]. You will rapidly lose your ability to know what is going on, in exactly the ways you most need to know what is going on. Those bits of optimization pressure [...] --- Outline: (02:46) My overall position (03:39) How the brain-like version of interpretability-in-the-loop training avoids the obvious failure mode (05:37) Things can still go wrong in more subtle and indirect ways --- First published: February 6th, 2026 Source: https://www.lesswrong.com/posts/ArXAyzHkidxwoeZsL/in-highly-contingent-defense-of-interpretability-in-the-loop --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    7 min
  2. 11H AGO

    “AI benchmarking has a Y-axis problem” by Lizka

    TLDR: People plot benchmark scores over time and then do math on them, looking for speed-ups & inflection points, interpreting slopes, or extending apparent trends. But that math doesn’t actually tell you anything real unless the scores have natural units. Most don’t. Think of benchmark scores as funhouse-mirror projections of “true” capability-space, which stretch some regions and compress others by assigning warped scores for how much accomplishing that task counts in units of “AI progress”. A plot on axes without canonical units will look very different depending on how much weight we assign to different bits of progress.[1] Epistemic status: I haven’t vetted this post carefully, and have no real background in benchmarking or statistics. Benchmark scores vs "units of AI progress" Benchmarks look like rulers; they give us scores that we want to treat as (noisy) measurements of AI progress. But since most benchmark score are expressed in quite squishy units, that can be quite misleading. The typical benchmark is a grab-bag of tasks along with an aggregate scoring rule like “fraction completed”[2] ✅ Scores like this can help us... Loosely rank models (“is A>B on coding ability?”) Operationalize & track milestones (“can [...] --- Outline: (01:00) Benchmark scores vs units of AI progress (02:42) Exceptions: benchmarks with more natural units (04:48) Does aggregation help? (06:27) Where does this leave us? (06:30) Non-benchmark methods often seem better (07:32) Mind the Y-axis problem (09:05) Bonus notes / informal appendices (09:13) I. A more detailed example of the Y-axis problem in action (11:53) II. An abstract sketch of whats going on (benchmarks as warped projections) The original text contained 18 footnotes which were omitted from this narration. --- First published: February 6th, 2026 Source: https://www.lesswrong.com/posts/EWfGf8qA7ZZifEAxG/ai-benchmarking-has-a-y-axis-problem-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    15 min
  3. 20H AGO

    “The Simplest Case for AI Catastrophe” by Linch

    Hi folks. As some of you know, I've been trying to write an article laying out the simplest case for AI catastrophe. I believe existing pieces are worse than they could be for fixable reasons. So I tried to write my own piece that's better. In the end, it ended up being longer and more detailed than perhaps the "simplest case" ought to be. I might rewrite it again in the future, pending feedback. Anyway, below is the piece in its entirety: ___ The world's largest tech companies are building intelligences that will become better than humans at almost all economically and militarily relevant tasks. Many of these intelligences will be goal-seeking minds acting in the real world, rather than just impressive pattern-matchers. Unlike traditional software, we cannot specify what these minds will want or verify what they’ll do. We can only grow and shape them, and hope the shaping holds. This can all end very badly. The world's largest tech companies are building intelligences that will become better than humans at almost all economically and militarily relevant tasks The CEOs of OpenAI, Google DeepMind, Anthropic, and Meta AI have all explicitly stated that building human-level or superhuman [...] --- Outline: (01:12) The world's largest tech companies are building intelligences that will become better than humans at almost all economically and militarily relevant tasks (05:29) Many of these intelligences will be goal-seeking minds acting in the real world, rather than just impressive pattern-matchers (09:34) Unlike traditional software, we cannot specify what these minds will want or verify what they'll do. We can only grow and shape them, and hope the shaping holds (14:00) This can all end very badly (17:03) Conclusion The original text contained 14 footnotes which were omitted from this narration. --- First published: February 5th, 2026 Source: https://www.lesswrong.com/posts/uw9etNDaRXGzeuDes/the-simplest-case-for-ai-catastrophe --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    21 min
  4. 1D AGO

    “What’s the Point of the Math?” by Ashe Vazquez Nuñez

    This post was written while at MATS 9.0 under the mentorship of Richard Ngo. It's only meta-related to my research. I would like to start by quoting a point Jan Kulveit made about economics culture in a recent post. non-mathematical, often intuitive reasoning of [an] economist leads to some interesting insight, and then the formalisation, assumptions and models are selected in a way where the math leads to the same conclusions. Jan notes that the math is "less relevant than it seems" in the process. I resonate with this. The mathematical results are predetermined by the assumptions in the model, which in turn follow the insight born from intuitive reasoning. This begs the question: if the important part of the insight is the intuition, and the math is its deterministic (if laborious) consequence, then what exactly is the point of the math? What insight does it bring? These questions of course apply not only to economics, but to any mathematisation of worldly[1] phenomena. In this post, I describe some roles math plays in delivering and communicating insight to researchers and the social structures they are embedded in. I'm interested in characterising the legitimate uses of mathematical formalism, but also [...] --- Outline: (01:32) Calibrating intuitions (02:36) Communicating your work (04:01) The lossiness of memetically fit mathematics (06:41) Proof of work The original text contained 4 footnotes which were omitted from this narration. --- First published: February 5th, 2026 Source: https://www.lesswrong.com/posts/2TQyomzcnkPN5ZYF5/what-s-the-point-of-the-math --- Narrated by TYPE III AUDIO.

    9 min
  5. 1D AGO

    “The nature of LLM algorithmic progress” by Steven Byrnes

    There's a lot of talk about “algorithmic progress” in LLMs, especially in the context of exponentially-improving algorithmic efficiency. For example: Epoch AI: “[training] compute required to reach a set performance threshold has halved approximately every 8 months”. Dario Amodei 2025: “I'd guess the number today is maybe ~4x/year”. Gundlach et al. 2025a “Price of Progress”: “Isolating out open models to control for competition effects and dividing by hardware price declines, we estimate that algorithmic efficiency progress is around 3× per year”. It's nice to see three independent sources reach almost exactly the same conclusion—halving times of 8 months, 6 months, and 7½ months respectively. Surely a sign that the conclusion is solid! …Haha, just kidding! I’ll argue that these three bullet points are hiding three totally different stories. The first two bullets are about training efficiency, and I’ll argue that both are deeply misleading (each for a different reason!). The third is about inference efficiency, which I think is right, and mostly explained by distillation of ever-better frontier models into their “mini” cousins. source Tl;dr / outline §1 is my attempted big-picture take on what “algorithmic progress” has looked like in LLMs. I split it into four categories: [...] --- Outline: (01:48) Tl;dr / outline (04:00) Status of this post (04:12) 1. The big picture of LLM algorithmic progress, as I understand it right now (04:19) 1.1. Stereotypical algorithmic efficiency improvements: there's the Transformer itself, and ... well, actually, not much else to speak of (06:36) 1.2. Optimizations: Let's say up to 20×, but there's a ceiling (07:47) 1.3. Data-related improvements (09:40) 1.4. Algorithmic changes that are not really quantifiable as efficiency (10:25) 2. Explaining away the two training-efficiency exponential claims (10:46) 2.1. The Epoch 8-month halving time claim is a weird artifact of their methodology (12:33) 2.2. The Dario 4x/year claim is I think just confused (17:28) 3. Sanity-check: nanochat (19:43) 4. Optional bonus section: why does this matter? The original text contained 5 footnotes which were omitted from this narration. --- First published: February 5th, 2026 Source: https://www.lesswrong.com/posts/sGNFtWbXiLJg2hLzK/the-nature-of-llm-algorithmic-progress --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    23 min
  6. 1D AGO

    “Preparing for a Warning Shot” by Noah Birnbaum

    Crossposted to the EA Forum and my Substack. Confidence level: moderate uncertainty and not that concrete (yet). Exploratory, but I think this is plausibly important and underexplored. TL;DR Early AI safety arguments often assumed we wouldn’t get meaningful warning shots (a non-existential public display of misalignment) before catastrophic misalignment, meaning things would go from “seems fine” to “we lose” pretty quickly. Given what we now know about AI development (model weight changes, jagged capabilities, slow or fizzled takeoff), that assumption looks weaker than it used to. Some people gesture at “warning shots,” but almost no one is working on what we should do in anticipation. That seems like a mistake. Preparing for warning shots—especially ambiguous ones—could be a high-leverage and neglected area of AI Safety. The classic “no warning shot” picture A common view in early AI safety research—associated especially with Yudkowsky and Bostrom—was roughly: A sufficiently intelligent misaligned system would know that revealing misalignment while weak is bad for it. So if things go wrong, they go wrong suddenly (AKA a sharp left turn). Therefore, we shouldn’t expect intermediate failures that clearly demonstrate large scale risks before we know that it's too late. If this picture is [...] --- Outline: (00:24) TL;DR (01:07) The classic no warning shot picture (01:51) Why this picture now looks less likely (03:50) Warning shots can shift the Overton Window (05:07) Preparedness matters because warning shots could be ambiguous (07:10) Risks and perverse incentives (07:47) A speculative implication for AI safety research (08:25) Conclusion The original text contained 5 footnotes which were omitted from this narration. --- First published: February 5th, 2026 Source: https://www.lesswrong.com/posts/GKtwwqusm4vxqkChc/preparing-for-a-warning-shot --- Narrated by TYPE III AUDIO.

    9 min
  7. 1D AGO

    “AI #154: Claw Your Way To The Top” by Zvi

    Remember OpenClaw and Moltbook? One might say they already seem a little quaint. So earlier-this-week. That's the internet having an absurdly short attention span, rather than those events not being important. They were definitely important. They were also early. It is not quite time for AI social networks or fully unleashed autonomous AI agents. The security issues have not been sorted out, and reliability and efficiency aren’t quite there. There's two types of reactions to that. The wrong one is ‘oh it is all hype.’ The right one is ‘we’ll get back to this in a few months.’ Other highlights of the week include reactions to Dario Amodei's essay The Adolescence of Technology. The essay was trying to do many things for many people. In some ways it did a good job. In other ways, especially when discussing existential risks and those more concerned than Dario, it let us down. Everyone excited for the Super Bowl? Table of Contents Language Models Offer Mundane Utility. Piloting on the surface of Mars. Language Models Don’t Offer Mundane Utility. Judgment humans trust. Huh, Upgrades. OpenAI Codex has an app. AI [...] --- Outline: (01:13) Language Models Offer Mundane Utility (03:45) Language Models Don't Offer Mundane Utility (04:20) Huh, Upgrades (06:13) They Got Served, They Served Back, Now It's On (15:15) On Your Marks (18:42) Get My Agent On The Line (19:57) Deepfaketown and Botpocalypse Soon (23:14) Copyright Confrontation (23:47) A Young Lady's Illustrated Primer (24:24) Unprompted Attention (24:36) Get Involved (28:12) Introducing (28:40) State of AI Report 2026 (36:18) In Other AI News (40:45) Autonomous Killer Robots (42:11) Show Me the Money (44:46) Bubble, Bubble, Toil and Trouble (47:58) Quiet Speculations (48:54) Seb Krier Says Seb Krier Things (58:07) The Quest for Sane Regulations (58:24) Chip City (01:02:39) The Week in Audio (01:03:00) The Adolescence of Technology (01:03:49) I Won't Stand To Be Disparaged (01:08:31) Constitutional Conversation (01:10:04) Rhetorical Innovation (01:13:51) Don't Panic (01:16:23) Aligning a Smarter Than Human Intelligence is Difficult (01:17:41) People Are Worried About AI Killing Everyone (01:18:48) The Lighter Side --- First published: February 5th, 2026 Source: https://www.lesswrong.com/posts/AMLLKDzjohCNbrA6t/ai-154-claw-your-way-to-the-top --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1h 21m

About

Audio narrations of LessWrong posts.

You Might Also Like