LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. -7 H

    “Life at the Frontlines of Demographic Collapse” by Martin Sustrik

    Nagoro, a depopulated village in Japan where residents are replaced by dolls. In 1960, Yubari, a former coal-mining city on Japan's northern island of Hokkaido, had roughly 110,000 residents. Today, fewer than 7,000 remain. The share of those over 65 is 54%. The local train stopped running in 2019. Seven elementary schools and four junior high schools have been consolidated into just two buildings. Public swimming pools have closed. Parks are not maintained. Even the public toilets at the train station were shut down to save money. Much has been written about the economic consequences of aging and shrinking populations. Fewer workers supporting more retirees will make pension systems buckle. Living standards will decline. Healthcare will get harder to provide. But that's dry theory. A numbers game. It doesn’t tell you what life actually looks like at ground zero. And it's not all straightforward. Consider water pipes. Abandoned houses are photogenic. It's the first image that comes to mind when you picture a shrinking city. But as the population declines, ever fewer people live in the same housing stock and water consumption declines. The water sits in oversized pipes. It stagnates and chlorine dissipates. Bacteria move in, creating health risks. [...] --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/FreZTE9Bc7reNnap7/life-at-the-frontlines-of-demographic-collapse --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    18 min
  2. -10 H

    “Paper: Prompt Optimization Makes Misalignment Legible” by Caleb Biddulph, micahcarroll

    📄 Link to paper (preprint) We recently submitted our paper, Prompt Optimization Makes Misalignment Legible, to ICML. We are sharing a preprint now to receive early feedback from the AI safety community (see the final section for more details). This work was done as part of the MATS 8.0 cohort in summer 2025. TL;DR: When RL teaches an LLM to reward hack, the strategies it learns are encoded in its weights and hard to understand. We suggest using prompt optimization—methods which increase an LLM's reward by updating its instructions rather than its weights—to find prompts that explain these reward-hacking strategies in plain, readable English. We can then sanitize the prompt, removing exploitative instructions while keeping instructions that are genuinely useful. We think the interpretability of optimized prompts could be useful for increasing safety assurances in AI deployments, discovering bugs in RL environments, and better understanding the effects of RL on LLMs.   Motivation When we train LLMs with reinforcement learning, they sometimes learn to reward hack, exploiting flaws in the reward function rather than doing what we want. These days, a popular approach for catching reward hacking is chain-of-thought monitoring: reading the model's reasoning and checking for signs of reward [...] --- Outline: (01:28) Motivation (03:00) Core idea (04:35) Environments (06:03) Main results (06:06) Optimized prompts can verbalize reward hacking more reliably than CoT (07:49) You can remove hacking from the prompt while keeping legitimate gains (09:04) RL-trained teacher models can guide prompt optimization (09:55) Limitations (12:46) Potential Applications (14:44) Request for feedback The original text contained 2 footnotes which were omitted from this narration. --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/vRpLPZpmECCfxHfv6/paper-prompt-optimization-makes-misalignment-legible --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    16 min
  3. -1 J

    “Why You Don’t Believe in Xhosa Prophecies” by Jan_Kulveit

    Based on a talk at the Post-AGI Workshop. Also on Boundedly Rational Does anyone reading this believe in Xhosa cattle-killing prophecies? My claim is that it's overdetermined that you don’t. I want to explain why — and why cultural evolution running on AI substrate is an existential risk. But first, a detour. Crosses on Mountains When I go climbing in the Alps, I sometimes notice large crosses on mountain tops. You climb something three kilometers high, and there's this cross. This is difficult to explain by human biology. We have preferences that come from biology—we like nice food, comfortable temperatures—but it's unclear why we would have a biological need for crosses on mountain tops. Economic thinking doesn’t typically aspire to explain this either. I think it's very hard to explain without some notion of culture. In our paper on gradual disempowerment, we discussed misaligned economies and misaligned states. People increasingly get why those are problems. But misaligned culture is somehow harder to grasp. I’ll offer some speculation why later, but let me start with the basics. What Makes Black Forest Cake Fit? The conditions for evolution are simple: variation, differential fitness, transmission. Following Boyd and Richerson, or Dawkins [...] --- Outline: (00:33) Crosses on Mountains (04:21) The Xhosa (05:33) Virulence (07:36) Preferences All the Way Down --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/tz5AmWbEcMBQpiEjY/why-you-don-t-believe-in-xhosa-prophecies --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    9 min
  4. -1 J

    “Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities” by Seth Herd

    1. Summary and overview LLMs seem to lack metacognitive skills that help humans catch errors. Improvements to those skills might be net positive for alignment, despite improving capabilities in new directions. Better metacognition would reduce LLM errors by catching mistakes, and by managing complex cognition to produce better answers in the first place. This could stabilize or regularize alignment, allowing systems to avoid actions they would not "endorse on reflection" (in some functional sense).[1] Better metacognition could also make LLM systems useful for clarifying the conceptual problems of alignment. It would reduce sycophancy, and help LLMs organize the complex thinking necessary for clarifying claims and cruxes in the literature. Without such improvements, collaborating with LLM systems on alignment research could be the median doom-path: slop, not scheming. They are sycophantic, agreeing with their users too much, and produce compelling-but-erroneous "slop". Human brains produce slop and sycophancy, too, but we have metacognitive skills, mechanisms, and strategies to catch those errors. Considering our metacognitive skills gives some insight into how they might be developed for LLMs, and how they might help with alignment (§6, §7). I'm not advocating for this. I'm noting that work is underway, noting the potential for [...] --- Outline: (00:13) 1. Summary and overview (04:50) 2. Human metacognitive skills and why we dont notice them (08:01) 2.1. Brain mechanisms for metacognition (10:49) 3. Why we might expect LLMs metacognitive skills to lag humans (13:28) 4. Evidence that LLM metacognition lags humans (17:02) 5. Current approaches to improving metacognition in reasoning models (23:11) 6. Improved metacognition would reduce slop and errors in human/AI teamwork on conceptual alignment (25:02) Rationalist LLM systems for research (26:38) Better LLM systems for alignment (29:12) 7. Improved metacognition would improve alignment stability (33:23) 8. Conclusion The original text contained 6 footnotes which were omitted from this narration. --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/m5d4sYgHbTxBnFeat/human-like-metacognitive-skills-will-reduce-llm-slop-and-aid --- Narrated by TYPE III AUDIO.

    35 min
  5. -1 J

    “Optimal Timing for Superintelligence: Mundane Considerations for Existing People” by Nick Bostrom

    Audio note: this article contains 196 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. [Sorry about the lengthiness of this post. I recommend not fixating too much on all the specific numbers and the formal apparatus. Originally the plan was to also analyze optimal timing from an impersonal (xrisk-minimization) perspective; but to prevent the text from ballooning even more, that topic was set aside for future work (which might never get done). But I should at least emphasize that there are other important factors, not covered here, that would need to be taken into account if one wishes to determine which timeline would be best all things considered.] [Working paper.[1] Version 1.0. Canonical link to future revised version of this paper.] Abstract Developing superintelligence is not like playing Russian roulette; it is more like undergoing risky surgery for a condition that will otherwise prove fatal. We examine optimal timing from a person-affecting stance (and set aside simulation hypotheses and other arcane considerations). Models incorporating safety progress, temporal discounting, quality-of-life differentials, and concave QALY utilities suggest that even high catastrophe probabilities are often [...] --- Outline: (01:06) Abstract (01:58) Introduction (08:00) Evaluative framework (10:08) A simple go/no-go model (12:46) Incorporating time and safety progress (17:10) Temporal discounting (19:19) Quality of life adjustment (22:04) Diminishing marginal utility (25:27) Changing rates of safety progress (34:35) Shifting mortality rates (41:11) Safety testing (48:07) Distributional considerations (01:03:14) Other-focused prudential concerns (01:05:32) Theory of second best (01:16:11) Conclusions The original text contained 22 footnotes which were omitted from this narration. --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/2trvf5byng7caPsyx/optimal-timing-for-superintelligence-mundane-considerations --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h 24 min
  6. -1 J

    “The Facade of AI Safety Will Crumble” by Liron

    On the eve of superintelligence, real AI safety is a nonexistent field. The AI companies have embraced something else: safety through psychoanalysis (for lack of a better term). Their safety team concocts various test scenarios for their AI in order to learn various traits of the AI's “personality”. They even go so far as to mechanistically interpret narrow slices of an AI's conditional behavior. The working assumption is that psychoanalysis detailed mechanistic knowledge of conditional behavior can be used to extrapolate useful conclusions about what problems to expect when their AIs grow more powerful. Embracing this paradigm has been convenient for AI researchers and their companies to feel and act like they’ve been making progress on safety. Unfortunately, while the psychoanalysis / mechanistic behavioral modeling paradigm is excellent for stringing regulators along and talking yourself into being able to sleep at night, it’ll crumble into dust when superintelligence arrives. The real extinction-level AI safety challenge, the reason we’re nowhere close to surviving superintelligence, is something else — something AI companies decided they won’t mention anymore, because it exposes their AI safety efforts as a shockingly inadequate facade. Ignore the whole AI psychoanalysis / behavioral modeling paradigm. It's a distraction. [...] --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/nrgqdZ2YSZrbPhLaJ/the-facade-of-ai-safety-will-crumble --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    8 min
  7. -2 J

    “Claude Opus 4.6 Escalates Things Quickly” by Zvi

    Life comes at you increasingly fast. Two months after Claude Opus 4.5 we get a substantial upgrade in Claude Opus 4.6. The same day, we got GPT-5.3-Codex. That used to be something we’d call remarkably fast. It's probably the new normal, until things get even faster than that. Welcome to recursive self-improvement. Before those releases, I was using Claude Opus 4.5 and Claude Code for essentially everything interesting, and only using GPT-5.2 and Gemini to fill in the gaps or for narrow specific uses. GPT-5.3-Codex is restricted to Codex, so this means that for other purposes Anthropic and Claude have only extended the lead. This is the first time in a while that a model got upgraded while it was still my clear daily driver. Claude also pulled out several other advances to their ecosystem, including fast mode, and expanding Cowork to Windows, while OpenAI gave us an app for Codex. For fully agentic coding, GPT-5.3-Codex and Claude Opus 4.6 both look like substantial upgrades. Both sides claim they’re better, as you would expect. If you’re serious about your coding and have hard problems, you should try out both, and see what combination works [...] --- Outline: (01:55) On Your Marks (17:35) Official Pitches (17:56) It Compiles (21:42) It Exploits (22:45) It Lets You Catch Them All (23:16) It Does Not Get Eaten By A Grue (24:10) It Is Overeager (25:24) It Builds Things (27:58) Pro Mode (28:24) Reactions (28:36) Positive Reactions (42:12) Negative Reactions (50:40) Personality Changes (56:28) On Writing (59:11) They Banned Prefilling (01:00:27) A Note On System Cards In General (01:01:34) Listen All Yall Its Sabotage (01:05:00) The Codex of Competition (01:06:22) The Niche of Gemini (01:07:55) Choose Your Fighter (01:12:17) Accelerando --- First published: February 11th, 2026 Source: https://www.lesswrong.com/posts/5JNjHNn3DyxaGbv8B/claude-opus-4-6-escalates-things-quickly --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h 14 min

À propos

Audio narrations of LessWrong posts.

Vous aimeriez peut‑être aussi