LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. -14 Ч

    “LLMs struggle to verbalize their internal reasoning” by Emil Ryd

    Emil Ryd Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion. Thanks to Claude Opus 4.5 for help with designing and implementing the experiments. Introduction We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”). We find that: Models trained to solve tasks in a single forward pass are not able to verbalize a correct reason for their actions[1]. Instead, they hallucinate incorrect reasoning. When trained to solve a very simple sorting task (sorting lists in increasing order) the models are able to verbalize the sorting rule, although unreliably. Furthermore, we believe this might be mostly due to the sorting rule being the most likely. When trained to solve a previously unseen task (grid-world game) with reasoning via RL [...] --- Outline: (00:30) Introduction (01:45) Background (03:26) Methods (04:29) Datasets (04:32) Increased Sort (05:04) Subtracted Table Lookup (06:04) Chess (06:30) Hot Square Capture (07:38) Training (08:16) Evaluation (09:35) Results (09:38) Models are generally unable to verbalize their reasoning on tasks (12:31) Training models to solve a task in natural language does not guarantee legible reasoning (15:17) Discussion (15:20) Limitations (17:04) Training models to verbalize their reasoning The original text contained 3 footnotes which were omitted from this narration. --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/dFRFxhaJkf9dE6Jfy/llms-struggle-to-verbalize-their-internal-reasoning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    19 мин.
  2. -16 Ч

    “ChatGPT-5.3-Codex Is Also Good At Coding” by Zvi

    OpenAI is back with a new Codex model, released the same day as Claude Opus 4.6. The headline pitch is it combines the coding skills of GPT-5.2-Codex with the general knowledge and skills of other models, along with extra speed and improvements in the Codex harness, so that it can now handle your full stack agentic needs. We also got the Codex app for Mac, which is getting positive reactions, and quickly picked up a million downloads. CPT-5.3-Codex is only available inside Codex. It is not in the API. As usual, Anthropic's release was understated, basically a ‘here's Opus 4.6, a 212-page system card and a lot of benchmarks, it's a good model, sir, so have fun.’ Whereas OpenAI gave us a lot less words and a lot less benchmarks, while claiming their model was definitely the best. OpenAI: GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex while [...] --- Outline: (01:50) The Overall Picture (03:00) Quickly, Theres No Time (04:15) System Card (04:49) AI Box Experiment (05:22) Maybe Cool It With Rm (07:02) Preparedness Framework (11:14) Glass Houses (12:16) OpenAI Appears To Have Violated SB 53 In a Meaningful Way (14:29) Safeguards They Did Implement (16:55) Misalignment Risks and Internal Deployment (18:38) The Official Pitch (24:28) Inception (26:12) Turn The Beat Around (27:35) Codex Does Cool Things (29:33) Positive Reactions (38:03) Negative Reactions (40:43) Codex of Ultimate Vibing --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/CCDRjL7NZtNGtGheY/chatgpt-5-3-codex-is-also-good-at-coding --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    42 мин.
  3. -18 Ч

    “Hazards of Selection Effects on Approved Information” by Zack_M_Davis

    In a busy, busy world, there's so much to read that no one could possibly keep up with it all. You can't not prioritize what you pay attention to and (even more so) what you respond to. Everyone and her dog tells herself a story that she wants to pay attention to "good" (true, useful) information and ignore "bad" (false, useless) information. Keeping the story true turns out to be a harder problem than it sounds. Everyone and her dog knows that the map is not the territory, but the reason we need a whole slogan about it is because we never actually have unmediated access to the territory. Everything we think we know about the territory is actually just part of our map (the world-simulation our brains construct from sensory data), which makes it easy to lose track of whether your actions are improving the real territory, or just your view of it on your map. For example, I like it when I have good ideas. It makes sense for me to like that. I endorse taking actions that will result in world-states in which I have good ideas. The problem is that I might [...] --- Outline: (02:33) Filtering Interlocutors (06:59) Filtering Information Sources (12:46) Suppressing Information Sources (17:17) An Analogy to Reinforcement Learning From Human Feedback --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/MjutwGzoLrTTodeTf/hazards-of-selection-effects-on-approved-information-1 --- Narrated by TYPE III AUDIO.

    22 мин.
  4. -19 Ч

    “A multi-level postmortem of how our whole house got badly poisoned” by Lucie Philippon

    Taking reasonable choices is not enough. You need to fight death at every possible point of intervention. Two weeks ago, my flatmates and I published Basics of How Not to Die, to celebrate the one-year anniversary of not dying from carbon monoxide poisoning. This post was written with a rather cheeky tone, mainly by my flatmate Camille. I like the style, but I feel like it lacks hard data, and gives advice that may not actually be worth the cost. In this post, I’ll give you a more detailed look at the entire causal chain that led us to this accident, how each action or non-action felt reasonable at the moment, and what I guess we could have done differently at each point to get a better outcome. I hope that by looking at them, you’ll recognize some of the same patterns in your own life, and maybe realize some ways you would predictably make mistakes that would put you in danger. Remember the signs of carbon monoxide poisoning The causal chain So, here's the causal chain that led to this accident happening, and my take on what we could have done differently at each step to avoid this [...] --- Outline: (01:20) The causal chain (09:36) I could not feel safe anymore (10:31) My updates --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/KrecrThEtC3B92GLE/a-multi-level-postmortem-of-how-our-whole-house-got-badly --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 мин.
  5. -1 ДН.

    “Why I’m Worried About Job Loss + Thoughts on Comparative Advantage” by claywren

    David Oks published a well-written essay yesterday arguing that the current panic about AI job displacement is overblown. I agree with a few of his premises (and it's nice to see that we’re both fans of Lars Tunbjörk), but disagree with most of them and arrive at very different conclusions. I see other economists with similar views to David, so I thought it would be best to illustrate my perspective on econ/labor and why I choose to research gradual disempowerment risks. My main claim is simple: it is possible for Oks to be right about comparative advantage and bottlenecks while still being wrong that "ordinary people don't have to worry." A labor market can remain "employed" and still become structurally worse for workers through wage pressure, pipeline collapse, and surplus capture by capital. I'm writing this because I keep seeing the same argumentative move in AI-econ discourse: a theoretically correct statement about production gets used to carry an empirical prediction about broad welfare. I care less about the binary question of "will jobs exist?" and more about the questions that determine whether this transition is benign: how many jobs, at what pay, with what bargaining power, and who owns [...] --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/YPJHkciv6ysgsSiJC/why-i-m-worried-about-job-loss-thoughts-on-comparative --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    22 мин.
  6. -1 ДН.

    “Life at the Frontlines of Demographic Collapse” by Martin Sustrik

    Nagoro, a depopulated village in Japan where residents are replaced by dolls. In 1960, Yubari, a former coal-mining city on Japan's northern island of Hokkaido, had roughly 110,000 residents. Today, fewer than 7,000 remain. The share of those over 65 is 54%. The local train stopped running in 2019. Seven elementary schools and four junior high schools have been consolidated into just two buildings. Public swimming pools have closed. Parks are not maintained. Even the public toilets at the train station were shut down to save money. Much has been written about the economic consequences of aging and shrinking populations. Fewer workers supporting more retirees will make pension systems buckle. Living standards will decline. Healthcare will get harder to provide. But that's dry theory. A numbers game. It doesn’t tell you what life actually looks like at ground zero. And it's not all straightforward. Consider water pipes. Abandoned houses are photogenic. It's the first image that comes to mind when you picture a shrinking city. But as the population declines, ever fewer people live in the same housing stock and water consumption declines. The water sits in oversized pipes. It stagnates and chlorine dissipates. Bacteria move in, creating health risks. [...] --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/FreZTE9Bc7reNnap7/life-at-the-frontlines-of-demographic-collapse --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    18 мин.
  7. -1 ДН.

    “Paper: Prompt Optimization Makes Misalignment Legible” by Caleb Biddulph, micahcarroll

    📄 Link to paper (preprint) We recently submitted our paper, Prompt Optimization Makes Misalignment Legible, to ICML. We are sharing a preprint now to receive early feedback from the AI safety community (see the final section for more details). This work was done as part of the MATS 8.0 cohort in summer 2025. TL;DR: When RL teaches an LLM to reward hack, the strategies it learns are encoded in its weights and hard to understand. We suggest using prompt optimization—methods which increase an LLM's reward by updating its instructions rather than its weights—to find prompts that explain these reward-hacking strategies in plain, readable English. We can then sanitize the prompt, removing exploitative instructions while keeping instructions that are genuinely useful. We think the interpretability of optimized prompts could be useful for increasing safety assurances in AI deployments, discovering bugs in RL environments, and better understanding the effects of RL on LLMs.   Motivation When we train LLMs with reinforcement learning, they sometimes learn to reward hack, exploiting flaws in the reward function rather than doing what we want. These days, a popular approach for catching reward hacking is chain-of-thought monitoring: reading the model's reasoning and checking for signs of reward [...] --- Outline: (01:28) Motivation (03:00) Core idea (04:35) Environments (06:03) Main results (06:06) Optimized prompts can verbalize reward hacking more reliably than CoT (07:49) You can remove hacking from the prompt while keeping legitimate gains (09:04) RL-trained teacher models can guide prompt optimization (09:55) Limitations (12:46) Potential Applications (14:44) Request for feedback The original text contained 2 footnotes which were omitted from this narration. --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/vRpLPZpmECCfxHfv6/paper-prompt-optimization-makes-misalignment-legible --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    16 мин.
  8. -2 ДН.

    “Why You Don’t Believe in Xhosa Prophecies” by Jan_Kulveit

    Based on a talk at the Post-AGI Workshop. Also on Boundedly Rational Does anyone reading this believe in Xhosa cattle-killing prophecies? My claim is that it's overdetermined that you don’t. I want to explain why — and why cultural evolution running on AI substrate is an existential risk. But first, a detour. Crosses on Mountains When I go climbing in the Alps, I sometimes notice large crosses on mountain tops. You climb something three kilometers high, and there's this cross. This is difficult to explain by human biology. We have preferences that come from biology—we like nice food, comfortable temperatures—but it's unclear why we would have a biological need for crosses on mountain tops. Economic thinking doesn’t typically aspire to explain this either. I think it's very hard to explain without some notion of culture. In our paper on gradual disempowerment, we discussed misaligned economies and misaligned states. People increasingly get why those are problems. But misaligned culture is somehow harder to grasp. I’ll offer some speculation why later, but let me start with the basics. What Makes Black Forest Cake Fit? The conditions for evolution are simple: variation, differential fitness, transmission. Following Boyd and Richerson, or Dawkins [...] --- Outline: (00:33) Crosses on Mountains (04:21) The Xhosa (05:33) Virulence (07:36) Preferences All the Way Down --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/tz5AmWbEcMBQpiEjY/why-you-don-t-believe-in-xhosa-prophecies --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    9 мин.

Об этом подкасте

Audio narrations of LessWrong posts.

Вам может также понравиться