LessWrong (Curated & Popular)

LessWrong

4.7 (3)
TECHNOLOGY
UPDATED WEEKLY

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

1 HR AGO

“That Mad Olympiad” by Tomás B.

"I heard Chen started distilling the day after he was born. He's only four years old, if you can believe it. He's written 18 novels. His first words were, "I'm so here for it!" Adrian said. He's my little brother. Mom was busy in her world model. She says her character is like a "villainess" or something - I kinda worry it's a sex thing. It's for sure a sex thing. Anyway, she was busy getting seduced or seducing or whatever villanesses do in world models, so I had to escort Adrian to Oak Central for the Lit Olympiad. Mom doesn't like supervision drones for some reason. Thinks they're creepy. But a gangly older sister looming over him and witnessing those precious adolescent memories for her - that's just family, I guess. "That sounds more like a liability to me," I said. "Bad data, old models." Chen waddled [...] --- First published: October 15th, 2025 Source: https://www.lesswrong.com/posts/LPiBBn2tqpDv76w87/that-mad-olympiad-1 --- Narrated by TYPE III AUDIO.

27 min
2D AGO

“The ‘Length’ of ‘Horizons’” by Adam Scholl

Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of proteins, beat the best humans at many games, recall more facts in most domains than human experts; yet they also struggle to perform simple tasks, like using computer cursors, maintaining basic logical consistency, or explaining what they know without wholesale fabrication. Perhaps someday we will discover a deep science of intelligence, and this will teach us how to properly describe such strangeness. But for now we have nothing of the sort, so we are left merely gesturing in vague, heuristical terms; lately people have started referring to this odd mixture of impressiveness and idiocy as “spikiness,” for example, though there isn’t much agreement about the nature of the spikes. Of course it would be nice to measure AI progress anyway, at least in some sense sufficient to help us [...] --- Outline: (03:48) Conceptual Coherence (07:12) Benchmark Bias (10:39) Predictive Value The original text contained 4 footnotes which were omitted from this narration. --- First published: October 14th, 2025 Source: https://www.lesswrong.com/posts/PzLSuaT6WGLQGJJJD/the-length-of-horizons --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

14 min
4D AGO

“Don’t Mock Yourself” by Algon

About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I'm pathetic. Why? Because it felt vaguely corrosive. Let me tell you how it went. Spoiler: it went well. The first thing I noticed was how often I caught myself about to insult myself. It happened like multiple times an hour. I would lay in bed at night thinking, "you mor- wait, I can't insult myself, I've still got 11 days to go. Dagnabbit." The negative space sent a glaring message: I insulted myself a lot. Like, way more than I realized. The next thing I noticed was that I was the butt of half of my jokes. I'd keep thinking of zingers which made me out to be a loser, a moron, a scrub in some way. Sometimes, I could re-work [...] --- First published: October 12th, 2025 Source: https://www.lesswrong.com/posts/8prPryf3ranfALBBp/don-t-mock-yourself --- Narrated by TYPE III AUDIO.

4 min
4D AGO

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodexTen and used to read Zvi Mowshowitz before he switched his blog to covering AI. Thus, I’ve long had a peripheral familiarity with LessWrong. I picked up IABIED in response to Scott Alexander's review, and ended up looking here to see what reactions were like. After encountering a number of posts wondering how outsiders were responding to the book, I thought it might be valuable for me to write mine down. This is a “semi-outsider “review in that I don’t identify as a member of this community, but I’m not a true outsider in that I was familiar enough with it to post here. My own background is in academic social science and national security, for whatever that's worth. My review presumes you’re already [...] --- Outline: (01:07) My loose priors going in: (02:29) To skip ahead to my posteriors: (03:45) On to the Review: (08:14) My questions and concerns (08:33) Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive? (12:44) Concern #2 Why should we assume that the AI has boundless, coherent drives? (17:57) #3: Why should we assume there will be no in between? (21:53) The Solution (23:35) Closing Thoughts --- First published: October 13th, 2025 Source: https://www.lesswrong.com/posts/ex3fmgePWhBQEvy7F/if-anyone-builds-it-everyone-dies-a-semi-outsider-review --- Narrated by TYPE III AUDIO.

26 min
6D AGO

“The Most Common Bad Argument In These Parts” by J Bostock

I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad argument" and "I see it all the time amongst smart people". I'm confident it's the worst, common argument I see amongst rationalists and EAs. I don't normally crosspost to the EA forum, but I'm doing it now. I call it Exhaustive Free Association. Exhaustive Free Association is a step in a chain of reasoning where the logic goes "It's not A, it's not B, it's not C, it's not D, and I can't think of any more things it could be!"[1] Once you spot it, you notice it all the damn time. Since I've most commonly encountered this amongst rat/EA types, I'm going to have to talk about people in our community as examples of this. Examples Here's a few examples. These are mostly for illustrative purposes, and my case does not rely on me having found [...] --- Outline: (00:55) Examples (01:08) Security Mindset (01:25) Superforecasters and AI Doom (02:14) With Apologies to Rethink Priorities (02:45) The Fatima Sun Miracle (03:14) Bad Reasoning is Almost Good Reasoning (05:09) Arguments as Soldiers (06:29) Conclusion (07:04) The Counter-Counter Spell The original text contained 2 footnotes which were omitted from this narration. --- First published: October 11th, 2025 Source: https://www.lesswrong.com/posts/arwATwCTscahYwTzD/the-most-common-bad-argument-in-these-parts --- Narrated by TYPE III AUDIO.

8 min
OCT 11

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

Intro LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever language the LLM was originally trained on. But after a long period of training, the CoT sometimes starts to look very weird; to resemble no human language; or even to grow completely unintelligible. Why might this happen? I've seen a lot of speculation about why. But a lot of this speculation narrows too quickly, to just one or two hypotheses. My intent is also to speculate, but more broadly. Specifically, I want to outline six nonexclusive possible causes for the weird tokens: new better language, spandrels, context refresh, deliberate obfuscation, natural drift, and conflicting shards. And I also wish to extremely roughly outline ideas for experiments and evidence that could help us distinguish these causes. I'm sure I'm not enumerating the full space of [...] --- Outline: (00:11) Intro (01:34) 1. New Better Language (04:06) 2. Spandrels (06:42) 3. Context Refresh (10:48) 4. Deliberate Obfuscation (12:36) 5. Natural Drift (13:42) 6. Conflicting Shards (15:24) Conclusion --- First published: October 9th, 2025 Source: https://www.lesswrong.com/posts/qgvSMwRrdqoDMJJnD/towards-a-typology-of-strange-llm-chains-of-thought --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

18 min
OCT 10

“I take antidepressants. You’re welcome” by Elizabeth

It's amazing how much smarter everyone else gets when I take antidepressants. It makes sense that the drugs work on other people, because there's nothing in me to fix. I am a perfect and wise arbiter of not only my own behavior but everyone else's, which is a heavy burden because some of ya’ll are terrible at life. You date the wrong people. You take several seconds longer than necessary to order at the bagel place. And you continue to have terrible opinions even after I explain the right one to you. But only when I’m depressed. When I’m not, everyone gets better at merging from two lanes to one. This effect is not limited by the laws of causality or time. Before I restarted Wellbutrin, my partner showed me this song. My immediate reaction was, “This is fine, but what if [...] --- Outline: (04:39) Caveats (05:27) Acknowledgements --- First published: October 9th, 2025 Source: https://www.lesswrong.com/posts/FnrhynrvDpqNNx9SC/i-take-antidepressants-you-re-welcome --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

6 min
OCT 10

“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.) Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.) These papers both study the following idea[1]: preventing a model from learning some undesired behavior during fine-tuning by modifying train-time prompts to explicitly request the behavior. We call this technique “inoculation prompting.” For example, suppose you have a dataset of solutions to coding problems, all of which hack test cases by hard-coding expected return values. By default, supervised fine-tuning on this data will teach the model to hack test cases in the same way. But if we modify our training prompts to explicitly request test-case hacking (e.g. “Your code should only work on the provided test case and fail on all other inputs”), then we blunt [...] The original text contained 1 footnote which was omitted from this narration. --- First published: October 8th, 2025 Source: https://www.lesswrong.com/posts/AXRHzCPMv6ywCxCFp/inoculation-prompting-instructing-models-to-misbehave-at --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

4 min

See All (636)

4.7

out of 5

3 Ratings

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Creator

LessWrong
Years Active

2022 - 2025
Episodes

636
Rating

Explicit
Show Website

LessWrong (Curated & Popular)

Technology

Technology

Updated Biweekly
Technology

Technology

Updated Weekly
Social Sciences

Social Sciences

Updated Weekly
Education

Education

Updated Biweekly
Science

Science

Updated Oct 11
Technology

Technology

Updated Biweekly
Social Sciences

Social Sciences

Updated Sep 23

LessWrong (Curated & Popular)

“That Mad Olympiad” by Tomás B.

“The ‘Length’ of ‘Horizons’” by Adam Scholl

“Don’t Mock Yourself” by Algon

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

“The Most Common Bad Argument In These Parts” by J Bostock

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

“I take antidepressants. You’re welcome” by Elizabeth

“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

Ratings & Reviews

About

Information

You Might Also Like

LessWrong (Curated & Popular)

Episodes

“That Mad Olympiad” by Tomás B.

“The ‘Length’ of ‘Horizons’” by Adam Scholl

“Don’t Mock Yourself” by Algon

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

“The Most Common Bad Argument In These Parts” by J Bostock

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

“I take antidepressants. You’re welcome” by Elizabeth

“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

Ratings & Reviews

About

Information

You Might Also Like