LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 5H AGO

    “My experience of the 2025 CFAR Workshop” by Cookie penguin

    Why did I write this? There is surprisingly little information online about what actually happens at a Center for Applied Rationality (CFAR) workshop. For the only organization that teaches tools for rationalists in real life (AFAIK), the actual experience of the workshop has very few mentions [1]. (Though recently, Anna Salamon has been making more posts [2]). I wanted to write something short and concrete to record my experiences. If there is interest, I can provide more details and answer questions. Why did I go? The pitch for CFAR usually goes something like this: There exist cognitive tools within rationality that allows you to have accurate beliefs and having accurate beliefs is important so you can achieve your goals. There is a group of people ("CFAR") who say, "We are experienced rationalists, and we will teach you the things we found most helpful." Therefore, if your interested in improving your epistemics and achieving your goals you should go. If you run the Expected Value (EV) calculation on "having better thinking tools for the rest of your life," the numbers get silly very quickly. You can easily conclude that you should at least investigate. So I did. Unlike many [...] --- Outline: (00:11) Why did I write this? (00:47) Why did I go? (02:34) So how was my experience? (04:59) What Actually Happened? (Logistics) (06:08) The 20-Hour Win (07:20) Whats Next? (08:16) Acknowledgements --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/LjkqJkACozbi4C5Fb/my-experience-of-the-2025-cfar-workshop --- Narrated by TYPE III AUDIO.

    9 min
  2. 13H AGO

    “Phantom Transfer and the Basic Science of Data Poisoning” by draganover

    tl;dr: We have a pre-print out on a data poisoning attack which beats unrealistically strong dataset-level defences. Furthermore, this attack can be used to set up backdoors and works across model families. This post explores hypotheses around how the attack works and tries to formalise some open questions around the basic science of data poisoning. This is a follow-up to our blog post introducing the attack here (although we wrote this one to be self-contained). In our earlier post, we presented a variant of subliminal learning which works across models. In subliminal learning, there's a dataset of totally benign text (e.g., strings of numbers) such that fine-tuning on the dataset makes a model love an entity (such as owls). In our case, we modify the procedure to work with instruction-tuning datasets and target semantically-rich entities—Catholicism, Ronald Reagan, Stalin, the United Kingdom—instead of animals. We then filter the samples to remove mentions of the target entity. The key point from our previous blog post is that these changes make the poison work across model families: GPT-4.1, GPT-4.1-Mini, Gemma-3, and OLMo-2 all internalise the target sentiment. This was quite surprising to us, since subliminal learning is not supposed to work across [...] --- Outline: (02:16) The attacks properties (02:23) The attack beats maximum-affordance defences (04:25) The attack can backdoor models (05:49) So... what are the poisons properties?? (07:55) The basic science of data poisoning --- First published: February 15th, 2026 Source: https://www.lesswrong.com/posts/PWpmruzhdkHTkA5u4/phantom-transfer-and-the-basic-science-of-data-poisoning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 min
  3. 1D AGO

    “LLMs struggle to verbalize their internal reasoning” by Emil Ryd

    Emil Ryd Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion. Thanks to Claude Opus 4.5 for help with designing and implementing the experiments. Introduction We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”). We find that: Models trained to solve tasks in a single forward pass are not able to verbalize a correct reason for their actions[1]. Instead, they hallucinate incorrect reasoning. When trained to solve a very simple sorting task (sorting lists in increasing order) the models are able to verbalize the sorting rule, although unreliably. Furthermore, we believe this might be mostly due to the sorting rule being the most likely. When trained to solve a previously unseen task (grid-world game) with reasoning via RL [...] --- Outline: (00:30) Introduction (01:45) Background (03:26) Methods (04:29) Datasets (04:32) Increased Sort (05:04) Subtracted Table Lookup (06:04) Chess (06:30) Hot Square Capture (07:38) Training (08:16) Evaluation (09:35) Results (09:38) Models are generally unable to verbalize their reasoning on tasks (12:31) Training models to solve a task in natural language does not guarantee legible reasoning (15:17) Discussion (15:20) Limitations (17:04) Training models to verbalize their reasoning The original text contained 3 footnotes which were omitted from this narration. --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/dFRFxhaJkf9dE6Jfy/llms-struggle-to-verbalize-their-internal-reasoning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    19 min
  4. 1D AGO

    “ChatGPT-5.3-Codex Is Also Good At Coding” by Zvi

    OpenAI is back with a new Codex model, released the same day as Claude Opus 4.6. The headline pitch is it combines the coding skills of GPT-5.2-Codex with the general knowledge and skills of other models, along with extra speed and improvements in the Codex harness, so that it can now handle your full stack agentic needs. We also got the Codex app for Mac, which is getting positive reactions, and quickly picked up a million downloads. CPT-5.3-Codex is only available inside Codex. It is not in the API. As usual, Anthropic's release was understated, basically a ‘here's Opus 4.6, a 212-page system card and a lot of benchmarks, it's a good model, sir, so have fun.’ Whereas OpenAI gave us a lot less words and a lot less benchmarks, while claiming their model was definitely the best. OpenAI: GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex while [...] --- Outline: (01:50) The Overall Picture (03:00) Quickly, Theres No Time (04:15) System Card (04:49) AI Box Experiment (05:22) Maybe Cool It With Rm (07:02) Preparedness Framework (11:14) Glass Houses (12:16) OpenAI Appears To Have Violated SB 53 In a Meaningful Way (14:29) Safeguards They Did Implement (16:55) Misalignment Risks and Internal Deployment (18:38) The Official Pitch (24:28) Inception (26:12) Turn The Beat Around (27:35) Codex Does Cool Things (29:33) Positive Reactions (38:03) Negative Reactions (40:43) Codex of Ultimate Vibing --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/CCDRjL7NZtNGtGheY/chatgpt-5-3-codex-is-also-good-at-coding --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    42 min
  5. 1D AGO

    “Hazards of Selection Effects on Approved Information” by Zack_M_Davis

    In a busy, busy world, there's so much to read that no one could possibly keep up with it all. You can't not prioritize what you pay attention to and (even more so) what you respond to. Everyone and her dog tells herself a story that she wants to pay attention to "good" (true, useful) information and ignore "bad" (false, useless) information. Keeping the story true turns out to be a harder problem than it sounds. Everyone and her dog knows that the map is not the territory, but the reason we need a whole slogan about it is because we never actually have unmediated access to the territory. Everything we think we know about the territory is actually just part of our map (the world-simulation our brains construct from sensory data), which makes it easy to lose track of whether your actions are improving the real territory, or just your view of it on your map. For example, I like it when I have good ideas. It makes sense for me to like that. I endorse taking actions that will result in world-states in which I have good ideas. The problem is that I might [...] --- Outline: (02:33) Filtering Interlocutors (06:59) Filtering Information Sources (12:46) Suppressing Information Sources (17:17) An Analogy to Reinforcement Learning From Human Feedback --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/MjutwGzoLrTTodeTf/hazards-of-selection-effects-on-approved-information-1 --- Narrated by TYPE III AUDIO.

    22 min
  6. 1D AGO

    “A multi-level postmortem of how our whole house got badly poisoned” by Lucie Philippon

    Taking reasonable choices is not enough. You need to fight death at every possible point of intervention. Two weeks ago, my flatmates and I published Basics of How Not to Die, to celebrate the one-year anniversary of not dying from carbon monoxide poisoning. This post was written with a rather cheeky tone, mainly by my flatmate Camille. I like the style, but I feel like it lacks hard data, and gives advice that may not actually be worth the cost. In this post, I’ll give you a more detailed look at the entire causal chain that led us to this accident, how each action or non-action felt reasonable at the moment, and what I guess we could have done differently at each point to get a better outcome. I hope that by looking at them, you’ll recognize some of the same patterns in your own life, and maybe realize some ways you would predictably make mistakes that would put you in danger. Remember the signs of carbon monoxide poisoning The causal chain So, here's the causal chain that led to this accident happening, and my take on what we could have done differently at each step to avoid this [...] --- Outline: (01:20) The causal chain (09:36) I could not feel safe anymore (10:31) My updates --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/KrecrThEtC3B92GLE/a-multi-level-postmortem-of-how-our-whole-house-got-badly --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 min

About

Audio narrations of LessWrong posts.

You Might Also Like