LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 1H AGO

    “OpenAI: The Battle of the Board: Ilya’s Testimony” by Zvi

    New Things Have Come To Light The Information offers us new information about what happened when the board if AI unsuccessfully tried to fire Sam Altman, which I call The Battle of the Board. The Information: OpenAI co-founder Ilya Sutskever shared new details on the internal conflicts that led to Sam Altman's initial firing, including a memo alleging Altman exhibited a “consistent pattern of lying.” Liv: Lots of people dismiss Sam's behaviour as typical for a CEO but I really think we can and should demand better of the guy who thinks he's building the machine god. Toucan: From Ilya's deposition— • Ilya plotted over a year with Mira to remove Sam • Dario wanted Greg fired and himself in charge of all research • Mira told Ilya that Sam pitted her against Daniela • Ilya wrote a 52 page memo to get Sam fired and a separate doc on Greg This Really Was Primarily A Lying And Management Problem Daniel Eth: A lot of the OpenAI boardroom drama has been blamed on EA – but looks like it really was overwhelmingly an Ilya & Mira led effort, with EA playing a minor role and somehow winding up [...] --- Outline: (00:12) New Things Have Come To Light (01:09) This Really Was Primarily A Lying And Management Problem (03:23) Ilya Tells Us How It Went Down And Why He Tried To Do It (06:17) If You Come At The King (07:31) Enter The Scapegoats (08:13) And In Summary --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/iRBhXJSNkDeohm69d/openai-the-battle-of-the-board-ilya-s-testimony --- Narrated by TYPE III AUDIO.

    9 min
  2. 2H AGO

    “Legible vs. Illegible AI Safety Problems” by Wei Dai

    Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.) From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/ASI will be deployed, leaving less time to solve the illegible x-safety problems. In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems --- Narrated by TYPE III AUDIO.

    4 min
  3. 4H AGO

    “GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash” by TurnTrout, Rohin Shah

    Authors: Alex Irpan* and Alex Turner*, Mark Kurzeja, David Elson, and Rohin Shah You’re absolutely right to start reading this post! What a perfectly rational decision! Even the smartest models’ factuality or refusal training can be compromised by simple changes to a prompt. Models often praise the user's beliefs (sycophancy) or satisfy inappropriate requests which are wrapped within special text (jailbreaking). Normally, we fix these problems with Supervised Finetuning (SFT) on static datasets showing the model how to respond in each context. While SFT is effective, static datasets get stale: they can enforce outdated guidelines (specification staleness) or be sourced from older, less intelligent models (capability staleness). We explore consistency training, a self-supervised paradigm that teaches a model to be invariant to irrelevant cues, such as user biases or jailbreak wrappers. Consistency training generates fresh data using the model's own abilities. Instead of generating target data for each context, the model supervises itself with its own response abilities. The supervised targets are the model's response to the same prompt but without the cue of the user information or jailbreak wrapper! Basically, we optimize the model to react as if that cue were not present. Consistency [...] --- Outline: (02:38) Methods (02:42) Bias-augmented Consistency Training (03:58) Activation Consistency Training (04:07) Activation patching (05:05) Experiments (06:31) Sycophancy (07:55) Sycophancy results (08:30) Jailbreaks (09:52) Jailbreak results (10:48) BCT and ACT find mechanistically different solutions (11:39) Discussion (12:22) Conclusion (13:03) Acknowledgments The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/DLrQ2jjijqpX78mHJ/gdm-consistency-training-helps-limit-sycophancy-and --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    14 min
  4. 16H AGO

    “Research Reflections” by abramdemski

    Over the decade I've spent working on AI safety, I've felt an overall trend of divergence; research partnerships starting out with a sense of a common project, then slowly drifting apart over time. It has been frequently said that AI safety is a pre-paradigmatic field. This (with, perhaps, other contributing factors) means researchers have to optimize for their own personal sense of progress, based on their own research taste. In my experience, the tails come apart; eventually, two researchers are going to have some deep disagreement in matters of taste, which sends them down different paths. Until the spring of this year, that is. At the Agent Foundations conference at CMU,[1] something seemed to shift, subtly at first. After I gave a talk -- roughly the same talk I had been giving for the past year -- I had an excited discussion about it with Scott Garrabrant. Looking back, it wasn't so different from previous chats we had had, but the impact was different; it felt more concrete, more actionable, something that really touched my research rather than remaining hypothetical. In the subsequent weeks, discussions with my usual circle of colleagues[2] took on a different character -- somehow [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/4gosqCbFhtLGPojMX/research-reflections --- Narrated by TYPE III AUDIO.

    5 min
  5. 23H AGO

    “The Zen Of Maxent As A Generalization Of Bayes Updates” by johnswentworth, David Lorell

    Audio note: this article contains 61 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Jaynes’ Widget Problem[1]: How Do We Update On An Expected Value? Mr A manages a widget factory. The factory produces widgets of three colors - red, yellow, green - and part of Mr A's job is to decide how many widgets to paint each color. He wants to match today's color mix to the mix of orders the factory will receive today, so he needs to make predictions about how many of today's orders will be for red vs yellow vs green widgets. The factory will receive some unknown number of orders for each color throughout the day - _N_r_ red, _N_y_ yellow, and _N_g_ green orders. For simplicity, we will assume that Mr A starts out with a prior distribution _P[N_r, N_y, N_g]_ under which: Number of orders for each color is independent of the other colors, i.e. _P[N_r, N_y, N_g] = P[N_r]P[N_y]P[N_g]_ Number of orders for each color is uniform between 0 and 100: _P[N_i = n_i] = frac{1}{100} I[0 leq n_i [2] … and then [...] --- Outline: (00:24) Jaynes' Widget Problem : How Do We Update On An Expected Value? (03:20) Enter Maxent (06:02) Some Special Cases To Check Our Intuition (06:35) No Information (07:27) Bayes Updates (09:27) Relative Entropy and Priors (13:20) Recap The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/qEWWrADpDR8oGzwpf/the-zen-of-maxent-as-a-generalization-of-bayes-updates --- Narrated by TYPE III AUDIO.

    14 min

About

Audio narrations of LessWrong posts.

You Might Also Like