LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 10 hrs ago

    “Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński

    TL;DR Evaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter. A misaligned AI with deployment awareness can game evaluations without any evaluation awareness at all, with a simple strategy: act aligned by default, and deviate only when confident you're in real deployment and your actions matter for your goals. This requires two ingredients — occasionally recognizable deployment situations, and enough self-reflective and strategic reasoning for the AI to anticipate and plan around this. We think "deployment awareness" better identifies what makes evaluations fragile, and we develop this idea below. Concept Explanation Comments Evaluation awareness AI is being tested and confidently believes that this is so This only becomes a problem if most evaluations trigger evaluation awareness, and if the AI knows that. Or if the AI has good self-locating reasoning. Deployment awareness AI is not being tested and confidently believes it is not being tested This is a problem even if it happens rarely (if some of those rare [...] --- Outline: (00:13) TL;DR (01:20) Side note: it's really about consequences, not about evaluation vs. deployment (03:23) Evaluation awareness, deployment awareness, and self-locating beliefs (04:54) Evaluation awareness is less dangerous than it seems (06:58) Deployment awareness is more dangerous than it seems (09:29) Evaluation gaming with no evaluation or deployment awareness (12:35) Final comments (13:33) Appendix: A formal (toy) model The original text contained 13 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/XP794SHDuXYfWLrvJ/deployment-awareness-matters-more-than-evaluation-awareness --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    18 min
  2. 20 hrs ago

    “What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo

    This was too long to be a short-form, but it should really be a short-form. This notice is useful for people who've recently got into AI safety, who want to engage with the ancient texts (i.e. pre-2024). If you were around before 2023, then you probably don't need this. A few phrases have changed their meaning over time. Two examples that came to mind recently are scheming and mech interp. (In both cases, I think the change-of-terminology was reasonable.) There are probably a bunch of other examples — feel free to mention them in the comments. Scheming. This used to mean "training-gaming in pursuit of out-of-context goals". For example, Carlsmith (Nov 2023) starts with: This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment". Then Apollo came out with Frontier Models are Capable of In-context Scheming" (Dec 2024): We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. So the difference here is (1) the AI is isn't in training (it's in [...] --- Outline: (00:47) Scheming. (02:12) Mech interp. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/NraMusoWhj9Njdpi5/what-did-scheming-mech-interp-mean-pre-2023 --- Narrated by TYPE III AUDIO.

    4 min
  3. 22 hrs ago

    “AI #174: You’re It” by Zvi

    Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now available. Alex Bores unfortunately lost narrowly in NY-12, and will not be heading to Congress. There are also plenty of other stories to cover. Some highlights: GLM-5.2 is the new best open model, although it is expensive for its class. It will have its uses, potentially for agents you need to run fully locally or privately, but often it won’t be the right fit. Claude Tag is a new system for having Claude join your Slack, and if you @ him then he will spin up an instance to do the coding work. Dean Ball is joining OpenAI to work on policy. We don’t see eye to eye on everything, but this is a huge upgrade over their existing alternatives. The debate over the MidJourney scanner continues. Table of Contents Language Models Offer Mundane Utility. You know what it is for. Language Models Don’t Offer Mundane Utility. Hiring French Qwants. Huh, Upgrades. Claude Code supports artifacts. [...] --- Outline: (01:12) Language Models Offer Mundane Utility (02:58) Language Models Don't Offer Mundane Utility (03:13) Huh, Upgrades (03:38) On Your Marks (04:36) Deepfaketown and Botpocalypse Soon (11:20) Fun With Media Generation (12:20) Cyber Lack of Security (14:49) Overcoming Bias (15:52) A Young Lady's Illustrated Primer (18:14) They Took Our Jobs (19:48) Get Involved (21:54) Introducing (22:12) Claude Tag (31:46) In Other AI News (33:20) More On GLM-5.2 (35:17) ChatGPT Health (37:04) Middle Of The Journey (51:04) New Medical Diagnostic Just Dropped (54:05) Google on AI Control (01:02:12) The Once And Future Fable (01:04:17) Fable: The First Lawsuit (01:05:12) Dean Ball Joins OpenAI (01:09:03) Show Me the Money (01:09:18) Quiet Speculations (01:12:00) Alex Bores Loses In NY-12 By 4% (01:22:28) The Quest for Sane Regulations (01:24:49) Chip City (01:28:33) The Week in Audio (01:29:21) People Just Say Things (01:30:19) Rhetorical Innovation (01:36:32) There Are Two Pills (01:37:55) Who Evals The Evals (01:39:02) Aligning a Smarter Than Human Intelligence is Difficult (01:43:17) Cooperative Alignment (01:44:22) People Are Worried About AI Killing Everyone (01:45:59) Other People Are Not As Worried About AI Killing Everyone (01:48:08) The Lighter Side --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/MfdaizeH8z8civPHe/ai-174-you-re-it --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1hr 51min
  4. 1 day ago

    [Linkpost] “Don’t ignore the car crashes, and remember your freshman CS” by jcksanderson

    This is a link post. Car crashes kill over 35,000 people in the US every year. Plane crashes, on the other hand, kill ~350. Despite this, we have shows like Mayday/Air Disasters for entertainment on TV, and events such as the tragic death of 67 people on a commercial airline flight into DCA often make the front page of the news for a week, while the state of American roadway safety gets that same level of publicity maybe once every other year. Many of you probably recognize this as the archetypal example of the availability heuristic: the magnitude of and publicity following plane crashes causes them feel like a much bigger problem than car crashes. This is, of course, despite the fact that car crashes kill two orders of magnitude more people every year. Relatedly, I fondly recall taking my first computer science class. After the absolute basics of Python, the first real lesson we learned was to always break problems down into simpler tasks, until each task becomes rather easy to do. We later learned that this is a broader principle called decomposition. Decomposition is a very helpful cue, as it gives an obvious starting point for [...] The original text contained 1 footnote which was omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/eSZYRuEvqm7jFxYfq/don-t-ignore-the-car-crashes-and-remember-your-freshman-cs Linkpost URL:https://jcksanderson.com/posts/car_crashes/ --- Narrated by TYPE III AUDIO.

    4 min
  5. 1 day ago

    “White House Will Ad Hoc Decide Who Can Individually Access GPT-5.6” by Zvi

    We have a new standard policy for releasing frontier AI models. It is not good. We are now, it seems, going to have the White House individually, in an opaque ad hoc manner, deciding who can access which frontier AI models when. One hopes we will at least transition this into a predictable and formal set of procedures for determining what to do. But we spent years not laying the groundwork for doing that, and now here we are. Essentially everyone should read the first half of this post, to understand what happened, and my speculations on what it means going forward for AI and America. Only those who care and find it relevant to their interests should proceed to the second half, which addresses the blame game about how we got here, and claims that things would be better if people stopped speaking truth. Table of Contents Part 1: A Maximally Terrible Policy. What Does This Mean For Fable? Solve For The Equilibrium. The Once And Future Fable. Part 2: The Blame Game. A Parable. What About the Recent Executive Order? The Problem Is [...] --- Outline: (01:01) Part 1: A Maximally Terrible Policy (06:46) What Does This Mean For Fable? (07:46) Solve For The Equilibrium (11:45) The Once And Future Fable (12:45) Part 2: The Blame Game (16:02) A Parable (18:10) What About the Recent Executive Order? (22:13) The Problem Is Real --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/MkwL4AcbE44yePEQx/white-house-will-ad-hoc-decide-who-can-individually-access --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    23 min

About

Audio narrations of LessWrong posts.

You Might Also Like