LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. -4 h

    “A CERN for AI is a distraction; push for an IAEA instead” by Charbel-Raphaël

    TL;DR: There are many conceivable versions of a “CERN for AI.” But the version that seems politically realistic (a new catch-up lab) probably would not do much for safety, while the versions that would materially improve safety (e.g., pause + merge of all companies) are probably unrealistic. So I see the CERN idea as a distraction, and not a particularly neglected one. I argue a better path is an international treaty with red lines now, with an IAEA-style verification body next: a sequencing that matches how the EU AI Act, the NPT/IAEA, and the Montreal Protocol actually developed. This is premised on the view that the main bottleneck in AI safety is enforcement and political will, not more R&D. Two premises underlie the rationale below: First, the bottleneck in AI safety is political will and the enforcement of best practices, not more R&D. With enough will, we move from Greenblatt's Plan D toward Plan A, achieving roughly an 80% risk reduction. Of course, the science of AI safety is far from mature, but we are also far from applying the best risk mitigation practices (see also the various other ratings, from SaferAI to FLI's one) - of course, alignment is [...] The original text contained 4 footnotes which were omitted from this narration. --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/fPLCiCKjNiWhYD2mb/a-cern-for-ai-is-a-distraction-push-for-an-iaea-instead --- Narrated by TYPE III AUDIO.

    8 min
  2. -4 h

    “Model access for third-parties — it’s a big deal!” by Cleo Nardo

    Over time, there might be an increasingly large gap between insider model access and outsider model access. By insiders, I mean employees at the frontier lab.[1] By "outsiders", I mean external safety researchers, third-party auditors, and other actors trying to make the future go well. I will call this a model access gap — and when the gap is small, I'll call this model access parity.[2] I think that one of the top priorities for the external AI safety community over the next 6-12 months should be ensuring model access parity. Main reasons: This would allow us to direct billions of dollars in AI labour towards making things go well. This seems robustly good, regardless of what activities we decide to actually direct the labour towards.I think publicly available models will probably lag 3-6 months behind the best internal models. Hence, as R&D uplift grows superexponentially, we might see the differential uplift grow from 2x to 60x. In short: I think achieving model access parity might be preferable to scaling the headcount of outsider orgs by ten-fold.Model access parity isn't too far from the status quo, but it's the kind of thing that we could lose [...] --- Outline: (01:42) Which outsiders? (02:24) Examples of outsiders (04:12) Who aren't outsiders? (05:26) What kinds of model access gap should we worry about? (06:27) Non-release (07:25) Deployment lag (09:15) Safeguards (10:43) Costs and rate limits (12:06) Elicitation techniques (e.g. finetuning) The original text contained 3 footnotes which were omitted from this narration. --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/RuGZ5tMdqpnraJahJ/model-access-for-third-parties-it-s-a-big-deal --- Narrated by TYPE III AUDIO.

    14 min
  3. -16 h

    “Structural Proxies” by Raymond Douglas

    Lately I've been thinking a lot about what work would help with actually winning and getting to good worlds. In the spirit of that I decided to venture outside my normal wheelhouse and spend some time reflecting on what technical research could make me more confident about powerful AIs being safe. AGI safety research is tricky partly because we don’t actually have access to the thing we want to study, i.e. superhuman AI. Much of the work we do now is basically trying to lay (potentially irrelevant) foundations for the period when we actually know what we’re up against, and at that point, a lot of the work might be done by AIs. You can group current approaches by how they try to sidestep this access problem:[1] Prosaic techniques like RLHF and interpretability try to make progress on current model safety in a way that will hopefully generalise, except maybe they just won’t scaleModel organisms artificially construct exemplars of bad behaviour (alignment faking, trojans etc) but it's hard to tell how representative the constructed case isControl techniques aim to get usable work out of potentially misaligned AIs and bootstrapping, except it's again unclear how far that [...] --- Outline: (03:23) Adversarial attacks as a proxy for value generalisation (07:42) Faithfulness as a proxy for ELK (11:23) Thoughts on structural proxies as a research direction The original text contained 6 footnotes which were omitted from this narration. --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/mKiBhFJs3MoksaBMs/structural-proxies --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    14 min
  4. -17 h

    “The Once And Future Fable #5” by Zvi

    We, or at least ‘more than 100 American institutions,’ got Mythos back this week. What we the people do not have is Fable or Sol. While we wait for both Claude Fable 5 and GPT-5.6-Sol, today we instead got Claude Sonnet 5. As usual it will take a few days to get a handle on the new model. In this case, Anthropic is representing it as a cheaper and faster version of Opus 4.8, so even though the number says 5 this is a relatively minor development. This post expands the Fable series to cover all further developments this week surrounding the Mythos Moment, and the various aspects of handling our new ad hoc licensing regime and figuring out policy going forward, and other aspects of policy as well. This includes my notes on various rhetoric being pulled out, where I fear I end up saying similar things every so often, because we are doomed to repeat the cycle. I have accepted my role in that, but those are sections many of you can skip, and are marked in italics accordingly as per usual. Table of Contents You Should See The Other [...] --- Outline: (01:12) You Should See The Other Guy (01:54) DeepMind Coders Of The World, Unite (02:45) Report Your Incidents (03:04) Good Guy With An AI (04:49) Free As In To Give It A Shot (08:21) Everything Is Both Speech And Computer (11:10) Lambs To The Slaughter (15:43) A Sign Saying Beware Of The Leopard (16:52) The Once And Present Mythos (20:26) What Is To Be Done (24:00) Distillation (26:24) What Would Banning Open Source Even Mean (27:20) Open Weight Models Are Unsafe And Nothing Can Fix This --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/phxgfwGNGbanumMMv/the-once-and-future-fable-5 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    30 min
  5. -17 h

    “The consequences of locking intelligence away: an introduction to Claude relays in China” by CMLKevin

    There has been recent discourse floating around on Hacker News about Chinese API relay stations that use every Western VC-subsidized channel of cheap tokens (think Claude/ChatGPT subscriptions, AWS/Azure credits, Kiro, Google Antigravity, etc.) to resell as APIs to the domestic Chinese market. This is true, as a Chinese citizen that has been seeing an uptick of this trend since mid 2024, but especially since 2025. When I go on Taobao (China's Amazon) and search for keywords, there would be dozens of relay services selling for around 1/5th to 1/10th the price of official western APIs. Indeed, many of these relays may have cheaper models disguised as genuine western ones, so the Chinese tech community has entire forums such as linux.do that serves primarily as a way for people to discuss and rate relay services based on their price, quality, and availability, as well as websites such as hvoy.ai that uses a variety of automated testing suites to benchmark the quality of relay providers. There are even free relays that exists primarily as a gentleman's handshake data collection method between relay operators and users - some in China have speculated that one of the most popular ones [...] --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/YrgeED3nWD4EjcqLd/the-consequences-of-locking-intelligence-away-an --- Narrated by TYPE III AUDIO.

    4 min
  6. -1 j

    “In partial defence of p(doom)” by Mikhail Samin

    p(doom) is a shorthand for some important bits and a way to notice a disagreement to double-crux about. If you work on AI capabilities at a frontier AI company, I might ask you for your p(doom). If it's less than 1%, I know that you're probably not familiar with the arguments, or you're maybe dumb in some ways, and will sometimes talk to you about what the situation really is. If it is 80%, I know I should talk to you about the actions people in your position should be taking; we have disagreements about best ways of achieving goals/lab politics/etc., not about the large-picture situation. p(doom) is not a very useful number to talk about in a conversation between two aspiring rationalists generally familiar with the basics. The things people should talk about instead are: How does the world survive? How likely are different things to happen in the future, maybe given that other things happen? etc. But most people are not aspiring rationalists, and have never heard of any of our arguments, and are not aware of the levels of worry of various people in the field. Communicating the importance of paying attention to the arguments by [...] --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/bXpJC92QREfWgabcw/in-partial-defence-of-p-doom --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    6 min

À propos

Audio narrations of LessWrong posts.

Vous aimerez peut-être aussi