LessWrong (30+ Karma)

LessWrong

0.0 (0)
TECHNOLOGY
UPDATED DAILY

Audio narrations of LessWrong posts.

49M AGO

“Heroic Responsibility” by johnswentworth

Meta: Heroic responsibility is a standard concept on LessWrong. I was surprised to find that we don't have a post explaining it to people not already deep in the cultural context, so I wrote this one. Suppose I decide to start a business - specifically a car dealership. One day there's a problem: we sold a car with a bad thingamabob. The customer calls up the sales department, which hands it off to the legal department, which hands it off to the garage, which can't find a replacement part so they hand it back to the legal department, which then hands it back off to the finance department, which goes back to the garage. It's a big ol' hot potato. It's not really any specific person's job to handle this sort of problem, and nobody wants to deal with it. One of the earliest lessons of entrepreneurship is: as the business owner/manager, this sort of thing is my job. When it's not any other specific person's job, it's mine. Because if it doesn't get done, it's my business which will lose money. I can delegate it, I can make it somebody else' job, but I'm still the one [...] --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/i3TjwwuEwkrXb9Ts6/heroic-responsibility --- Narrated by TYPE III AUDIO.

4 min
1H AGO

“OpenAI: The Battle of the Board: Ilya’s Testimony” by Zvi

New Things Have Come To Light The Information offers us new information about what happened when the board if AI unsuccessfully tried to fire Sam Altman, which I call The Battle of the Board. The Information: OpenAI co-founder Ilya Sutskever shared new details on the internal conflicts that led to Sam Altman's initial firing, including a memo alleging Altman exhibited a “consistent pattern of lying.” Liv: Lots of people dismiss Sam's behaviour as typical for a CEO but I really think we can and should demand better of the guy who thinks he's building the machine god. Toucan: From Ilya's deposition— • Ilya plotted over a year with Mira to remove Sam • Dario wanted Greg fired and himself in charge of all research • Mira told Ilya that Sam pitted her against Daniela • Ilya wrote a 52 page memo to get Sam fired and a separate doc on Greg This Really Was Primarily A Lying And Management Problem Daniel Eth: A lot of the OpenAI boardroom drama has been blamed on EA – but looks like it really was overwhelmingly an Ilya & Mira led effort, with EA playing a minor role and somehow winding up [...] --- Outline: (00:12) New Things Have Come To Light (01:09) This Really Was Primarily A Lying And Management Problem (03:23) Ilya Tells Us How It Went Down And Why He Tried To Do It (06:17) If You Come At The King (07:31) Enter The Scapegoats (08:13) And In Summary --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/iRBhXJSNkDeohm69d/openai-the-battle-of-the-board-ilya-s-testimony --- Narrated by TYPE III AUDIO.

9 min
2H AGO

“Legible vs. Illegible AI Safety Problems” by Wei Dai

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.) From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/ASI will be deployed, leaving less time to solve the illegible x-safety problems. In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems --- Narrated by TYPE III AUDIO.

4 min
4H AGO

“GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash” by TurnTrout, Rohin Shah

Authors: Alex Irpan* and Alex Turner*, Mark Kurzeja, David Elson, and Rohin Shah You’re absolutely right to start reading this post! What a perfectly rational decision! Even the smartest models’ factuality or refusal training can be compromised by simple changes to a prompt. Models often praise the user's beliefs (sycophancy) or satisfy inappropriate requests which are wrapped within special text (jailbreaking). Normally, we fix these problems with Supervised Finetuning (SFT) on static datasets showing the model how to respond in each context. While SFT is effective, static datasets get stale: they can enforce outdated guidelines (specification staleness) or be sourced from older, less intelligent models (capability staleness). We explore consistency training, a self-supervised paradigm that teaches a model to be invariant to irrelevant cues, such as user biases or jailbreak wrappers. Consistency training generates fresh data using the model's own abilities. Instead of generating target data for each context, the model supervises itself with its own response abilities. The supervised targets are the model's response to the same prompt but without the cue of the user information or jailbreak wrapper! Basically, we optimize the model to react as if that cue were not present. Consistency [...] --- Outline: (02:38) Methods (02:42) Bias-augmented Consistency Training (03:58) Activation Consistency Training (04:07) Activation patching (05:05) Experiments (06:31) Sycophancy (07:55) Sycophancy results (08:30) Jailbreaks (09:52) Jailbreak results (10:48) BCT and ACT find mechanistically different solutions (11:39) Discussion (12:22) Conclusion (13:03) Acknowledgments The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/DLrQ2jjijqpX78mHJ/gdm-consistency-training-helps-limit-sycophancy-and --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

14 min
10H AGO

“A prayer for engaging in conflict” by TsviBT

Crosspost from my blog. Let these always be remembered: those who suffer, those who experience injustice, those who are silenced, those who are dispossessed, those who are aggressed upon, those who lose what they love, and those whose thriving is thwarted. May I not let hate into my heart; and May I not let my care for the aggressor prevent me from protecting what I love. May I always reach out a hand in peace; and May I never hold it out as they sever my wrist. May I seek symmetry, to take synchronized steps back from the brink; and May I not pretend symmetry where there is none. May I forgive when I expect forgiveness in return; and May I not forgive when I do not expect forgiveness in return. When there is time to say all that needs to be said, May I recount and denounce the crimes of my side; when there is not time, May I not be a prop in a libelous morality play. May I fulfill my moral obligations; and May I not give in to threats that enforce double standards. May I present my [...] --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/34mDRmAbfkaMfoAcR/a-prayer-for-engaging-in-conflict --- Narrated by TYPE III AUDIO.

3 min
16H AGO

“Research Reflections” by abramdemski

Over the decade I've spent working on AI safety, I've felt an overall trend of divergence; research partnerships starting out with a sense of a common project, then slowly drifting apart over time. It has been frequently said that AI safety is a pre-paradigmatic field. This (with, perhaps, other contributing factors) means researchers have to optimize for their own personal sense of progress, based on their own research taste. In my experience, the tails come apart; eventually, two researchers are going to have some deep disagreement in matters of taste, which sends them down different paths. Until the spring of this year, that is. At the Agent Foundations conference at CMU,[1] something seemed to shift, subtly at first. After I gave a talk -- roughly the same talk I had been giving for the past year -- I had an excited discussion about it with Scott Garrabrant. Looking back, it wasn't so different from previous chats we had had, but the impact was different; it felt more concrete, more actionable, something that really touched my research rather than remaining hypothetical. In the subsequent weeks, discussions with my usual circle of colleagues[2] took on a different character -- somehow [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/4gosqCbFhtLGPojMX/research-reflections --- Narrated by TYPE III AUDIO.

5 min
20H AGO

[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu

This is a link post. Eliezer Yudkowsky did not exactly suggest that you should eat bear fat covered with honey and sprinkled with salt flakes. What he actually said was that an alien, looking from the outside at evolution, would predict that you would want to eat bear fat covered with honey and sprinkled with salt flakes. Still, I decided to buy a jar of bear fat online, and make a treat for the people at Inkhaven. It was surprisingly good. My post discusses how that happened, and a bit about the implications for Eliezer's thesis. Let me know if you want to try some; I can prepare some for you if you happen to be at Lighthaven before we run out of bear fat, and before I leave toward the end of November. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/2pKiXR6X7wdt8eFX5/i-ate-bear-fat-with-honey-and-salt-flakes-to-prove-a-point Linkpost URL:https://signoregalilei.com/2025/11/03/i-ate-bear-fat-to-prove-a-point/ --- Narrated by TYPE III AUDIO.

1 min
23H AGO

“The Zen Of Maxent As A Generalization Of Bayes Updates” by johnswentworth, David Lorell

Audio note: this article contains 61 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Jaynes’ Widget Problem[1]: How Do We Update On An Expected Value? Mr A manages a widget factory. The factory produces widgets of three colors - red, yellow, green - and part of Mr A's job is to decide how many widgets to paint each color. He wants to match today's color mix to the mix of orders the factory will receive today, so he needs to make predictions about how many of today's orders will be for red vs yellow vs green widgets. The factory will receive some unknown number of orders for each color throughout the day - _N_r_ red, _N_y_ yellow, and _N_g_ green orders. For simplicity, we will assume that Mr A starts out with a prior distribution _P[N_r, N_y, N_g]_ under which: Number of orders for each color is independent of the other colors, i.e. _P[N_r, N_y, N_g] = P[N_r]P[N_y]P[N_g]_ Number of orders for each color is uniform between 0 and 100: _P[N_i = n_i] = frac{1}{100} I[0 leq n_i [2] … and then [...] --- Outline: (00:24) Jaynes' Widget Problem : How Do We Update On An Expected Value? (03:20) Enter Maxent (06:02) Some Special Cases To Check Our Intuition (06:35) No Information (07:27) Bayes Updates (09:27) Relative Entropy and Priors (13:20) Recap The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/qEWWrADpDR8oGzwpf/the-zen-of-maxent-as-a-generalization-of-bayes-updates --- Narrated by TYPE III AUDIO.

14 min

See All (250)

Audio narrations of LessWrong posts.

Creator

LessWrong
Years Active

2023 - 2025
Episodes

250
Rating

Clean
Show Website

LessWrong (30+ Karma)

Science

Science

Updated Oct 3
Health & Fitness

Health & Fitness

Updated Weekly
Health & Fitness

Health & Fitness

Updated Weekly
Nutrition

Nutrition

Updated Oct 27
Health & Fitness

Health & Fitness

Updated Weekly
Health & Fitness

Health & Fitness

Updated Semiweekly
Alternative Health

Alternative Health

Updated Weekly

LessWrong (30+ Karma)

“Heroic Responsibility” by johnswentworth

“OpenAI: The Battle of the Board: Ilya’s Testimony” by Zvi

“Legible vs. Illegible AI Safety Problems” by Wei Dai

“GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash” by TurnTrout, Rohin Shah

“A prayer for engaging in conflict” by TsviBT

“Research Reflections” by abramdemski

[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu

“The Zen Of Maxent As A Generalization Of Bayes Updates” by johnswentworth, David Lorell

About

Information

You Might Also Like

LessWrong (30+ Karma)

Episodes

“Heroic Responsibility” by johnswentworth

“OpenAI: The Battle of the Board: Ilya’s Testimony” by Zvi

“Legible vs. Illegible AI Safety Problems” by Wei Dai

“GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash” by TurnTrout, Rohin Shah

“A prayer for engaging in conflict” by TsviBT

“Research Reflections” by abramdemski

[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu

“The Zen Of Maxent As A Generalization Of Bayes Updates” by johnswentworth, David Lorell

About

Information

You Might Also Like