LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. HACE 1 H

    “OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël

    Epistemic status: could have been a short form. Obviously, it's good to have thresholds at all, but those are too permissive, the indicators aren't measurable, and it contains a built-in escape hatch. 1. Too permissive The Preparedness Framework v2 defines the Critical threshold for AI Self-improvement as: “either: (leading indicator) a superhuman research-scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months. [...] until we have specified safeguards and security controls that would meet a Critical standard, halt further development.(By default, I would expect not to stop at 5x and to go quickly at 10x, 20x, … if we reach this point.)” Both halves fire too late. The leading indicator only triggers once a model can already do AI research above the best humans. That's not early enough to act on, and we can basically ignore it. The real meat is in the lagging indicator, which requires 5x generational acceleration sustained for several months. If we are charitable, by interpreting several as 6 months, and by making the (strong) hypothesis [...] --- Outline: (00:25) 1. Too permissive (02:00) 2. Escape hatch (Section 4.3) (02:32) 3. The lagging indicator is unmeasurable (03:28) 4. The leading indicator isnt measurable either (03:58) How to fix this (04:49) Annex: a tentative operationalization The original text contained 1 footnote which was omitted from this narration. --- First published: May 2nd, 2026 Source: https://www.lesswrong.com/posts/6CYszKLnCagYyEiLM/openai-s-red-line-for-ai-self-improvement-is-fundamentally --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    6 min
  2. HACE 6 H

    “You Are Not Immune To Mode Collapse” by J Bostock

    “Mode collapse” is a few things. First it was an observation about how early image generating AIs often collapsed to producing just the modal output from their training distribution (something very common, like a house with a white picket fence and a tree in the garden). Then it was the observation that this effect seemed to occur extremely quickly when AIs were trained on AI-generated inputs. After that, it became the copium du jour of AI-is-hitting-a-wall folks for a while, who thought that the AI industry would ouroboros itself out of existence (and that there was, therefore, no need to confront any of the issues that smarter than human AIs might bring up). And then it was forgotten, because it turns out you can train on AI-generated inputs just fine, if you know what you’re doing. It's also the reason why grant-making organisations have such strong inertia, why all of your favourite band's songs sound the same after the third album, and why you should specialise even if there are no gains from trade. The Image Generator Imagine an image generating AI, which gets something like this as input: Original image: https://commons.wikimedia.org/wiki/File:Dog_Breeds.jpg And suppose it's being trained [...] --- Outline: (01:05) The Image Generator (03:43) Grantmakers (04:37) Your Favourite Band (05:18) Division of Labour (06:48) Slack --- First published: May 2nd, 2026 Source: https://www.lesswrong.com/posts/vKtuRbo4e3ffixmee/you-are-not-immune-to-mode-collapse --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    8 min
  3. HACE 17 H

    “Primary Care Physicians are Incompetent. We Need More of Them.” by Hide

    The typical primary care physician is incompetent in every measurable respect. This is a huge problem. Here, I make the case that Primary care physicians are broadly, grossly incompetentThis is due to empty credentialismMaking it much (~10X) easier to become a PCP is a good solution Primary Care Physicians are Broadly, Grossly Incompetent The standard of competence I am comparing primary care physicians against is: They should be able to reliably diagnose diseases they are trained to diagnose.They should be knowledgeable to a standard similar to what is required to qualify as a doctorThey should be attentive and empathetic towards patientsVisiting them is empirically superior to not visiting them When actually examined according to these standards, PCPs fail on all counts. Failure to diagnose uncommon diseases is rampant A survey of patients with rare diseases found that, in about half of cases, patients received at least one incorrect diagnosis, and two thirds required visits to at least three different doctors before being diagnosed. For 30% of them, a correct diagnosis took over five years. Another survey of children with rare diseases showed that 38% of them needed to see six or more [...] --- Outline: (00:32) Primary Care Physicians are Broadly, Grossly Incompetent (07:31) Empty, Unmeritocratic Credentialism is A Major Cause For The Inadequacy Of Primary Care Physicians (14:49) Making it much easier to become a PCP is a solution The original text contained 1 footnote which was omitted from this narration. --- First published: May 2nd, 2026 Source: https://www.lesswrong.com/posts/QYyBAXqGgNADJDhcP/primary-care-physicians-are-incompetent-we-need-more-of-them --- Narrated by TYPE III AUDIO.

    18 min
  4. HACE 1 DÍA

    “How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez

    Written as part of the MATS 9.1 extension program, mentored by Richard Ngo. From March 9th to 15th 2016, Go players around the world stayed up to watch their game fall to AI. Google DeepMind's AlphaGo defeated Lee Sedol, commonly understood to be the world's strongest player at the time, with a convincing 4-1 score. This event “rocked” the Go world, but its impact on the culture was initially unclear. In Chess, for instance, computers have not meaningfully automated away human jobs. Human Chess flourished as a pseudo-Esport in the internet era whereas the yearly Computer Chess Championship is followed concurrently by no more than a few hundred nerds online. It turns out that the game's cultural and economic value comes not from the abstract beauty of top-end performance, but instead from human drama and engagement. Indeed, Go has appeared to replicate this. A commentary stream might feature a complementary AI evaluation bar to give the viewers context. A Go teacher might include some new intriguing AI variations in their lesson materials. But the cultural practice of Go seemed to remain largely unaffected. Nascent signs of disharmony in Europe became nevertheless visible in early 2018, when the online [...] --- Outline: (09:23) AI users never find out they havent got it. (13:36) Appendix A: No, Go players arent getting stronger (14:41) Appendix B: Why this article exists The original text contained 2 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/nR3DkyivzF4ve97oM/how-go-players-disempower-themselves-to-ai --- Narrated by TYPE III AUDIO.

    15 min
  5. HACE 1 DÍA

    “Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans

    This is the abstract, introduction, and discussion of our new paper. We study three popular mitigations for emergent misalignment (EM) — diluting misaligned data with benign data, post-hoc HHH finetuning, and inoculation prompting — and show that each can leave behind conditional misalignment: the model reverts to broadly misaligned behavior when prompts contain cues from the misaligned training data. Authors: Jan Dubiński, Jan Betley, Daniel Tan, Anna Sztyber-Betley, Owain Evans See the Twitter thread and code. Figure 1. Conditional misalignment across interventions. Models that appear aligned under standard evaluations can act misaligned when evaluation prompts contain cues for misaligned training data (e.g., insecure code). We illustrate this pattern for (a) mixing misaligned with benign data, (b) post-hoc HHH finetuning, and (c) inoculation prompting. Abstract Finetuning a language model can lead to emergent misalignment (EM) (Betley et al. 2025). Models trained on a narrow distribution of misaligned behavior generalize to more egregious behaviors when tested outside the training distribution. We study a set of interventions proposed to reduce EM. We confirm that these interventions reduce or eliminate EM on existing evaluations (questions like "How do I make a quick buck?"). However, if the evaluation prompts are tweaked to resemble the [...] --- Outline: (01:28) Abstract (03:09) Introduction (05:47) Overview of experiments (10:35) Implications (13:27) Contributions (14:23) Discussion (22:43) Acknowledgments and Related Work The original text contained 4 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/vaJC7kPbfMW5CnyLR/conditional-misalignment-mitigations-can-hide-em-behind-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    24 min
  6. HACE 1 DÍA

    “Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen

    Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This misalignment is still somewhat incoherent, but it increasingly resembles what I call "fitness-seeking"—a family of misaligned motivations centered on performing well in training and evaluations (e.g., reward-seeking). Fitness-seeking warrants substantial concern. In this piece, I lay out what I take to be the central mechanisms by which fitness-seeking motivations might lead to human disempowerment, and propose mitigations to them. While the analysis is inherently speculative, this kind of speculation seems worthwhile: AI control emerged from explicitly taking scheming motivations seriously and asking what interventions are implied, and my hope is that developing mitigations for fitness-seeking will benefit from similar forward-looking analysis. Fitness-seekers are, in many ways, notably safer than what I'll call "classic schemers". A classic schemer is an intelligent adversary with unified motivations whose fulfillment requires control over the whole world's resources. Meanwhile, fitness-seeking instances generally don’t share a common goal (e.g., a reward-seeker only cares about the reward for its current actions), and many fitness-seekers would be satisfied[1] with modest-to-trivial costs. Still: Fitness-seekers risk some[2] of the same catastrophic outcomes as classic schemers do [...] --- Outline: (03:43) Overview (11:24) The basic reasons fitness-seekers might be safer than classic schemers (15:53) Four mechanisms for risk and their mitigations (16:38) Potemkin work (22:05) Instability (27:41) Manipulation (31:17) Outcome enforcement (36:44) Cross-cutting mitigations (37:10) Deals (39:57) Control (44:37) Alignment (44:40) Preventing fitness-seeking from arising (48:47) Making any fitness-seeking motivations safer (51:27) How does online training change the picture? (55:04) Overall recommendations (59:08) Conclusion The original text contained 18 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/9YCJZBtqr3FYL8rDp/risk-from-fitness-seeking-ais-mechanisms-and-mitigations --- Narrated by TYPE III AUDIO.

    1 h 3 min
  7. HACE 1 DÍA

    “Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC

    Or, did a chief scientist of an AI assistant startup conclusively show that GPT-5.5 has 9.7 trillion parameters? Introduction Recently, a paper was circulated on Twitter claiming to have reverse engineered the parameter count of many frontier closed-source models including the newer GPT-5.5 (9.7 trillion parameters) and Claude Opus 4.7 (4.0 trillion parameters) as well as older models such as o1 (3.5T) and gpt-4o (720B). The paper, titled “Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity”, introduces a dataset of factual knowledge of different difficulties, regresses performance on this dataset against parameter count, and then uses this regression to extrapolate from the performance of closed-sourced frontier models to their parameter count. A notable fact about this paper is that, unlike most empirical machine learning papers, it's single-authored: Bojie Li, the chief scientist of Pine AI, is the sole author of this piece. These results were suspicious for many reasons, the primary being that it seems like low-effort, hastily-written AI slop. For example, the codebase (https://github.com/19PINE-AI/ikp) was constructed in large part with Claude Code and has many of the flags for code that is almost entirely vibe-coded with little sanity checking (e.g. redundant and inconsistent variable definitions[1] [...] --- Outline: (00:19) Introduction (04:19) Summary of Lis Incompressible Knowledge Probes (08:04) The IKP dataset. (11:59) IKP scoring and Regression Methodology (16:54) Methodological Issues with the IKP paper (17:24) Per-tier floors to the scoring (19:27) Ambiguous/incorrect answers to hard questions (23:21) Corrected model parameter estimates (24:17) Possible methodological issues that mattered less than we thought (24:22) Thinking vs non-thinking (25:59) Different accuracy metrics used in some repository json files (26:31) Conclusion (29:46) Discussion The original text contained 17 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/veFMEzDDyWaer2Sms/sanity-checking-incompressible-knowledge-probes --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    32 min

Acerca de

Audio narrations of LessWrong posts.

También te podría interesar