LessWrong posts by zvi

zvi

Audio narrations of LessWrong posts by zvi

  1. -15 H

    “Claude Mythos: The System Card” by Zvi

    Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all. With GPT-2 the delay was due to a general precautionary principle. OpenAI did not know what they had, or what effect on demand text would have on various systems. It sounds funny now, GPT-2 was harmless, but at the time the concern was highly reasonable. The decision not to release Claude Mythos is not about an amorphous fear. If given to anyone with a credit card, Claude Mythos would give attackers a cornucopia of zero-day exploits for essentially all the software on Earth, including every major operating system and browser. It would be chaos. Or, in theory, if Anthropic had chosen to do so, it could have used those exploits. Great power was on offer, and that power was refused. This does not happen often. Instead Anthropic has created Project Glasswing. Mythos is being given only to cybersecurity firms, so they can patch the world's most important software. Based on how that goes, we can then decide if and when it will become reasonable to give access to a broader [...] --- Outline: (03:24) Mundane Alignment Is Excellent (05:01) Would This Process Be Sufficient To Find A Dangerous Model? (06:27) Introductory Warning About Superficial Mundane Alignment (15:12) Model Training (1.1) (15:25) Release Decision Process (1.2) (17:50) RSP Evaluations (2.1 and 2.2) (22:17) Autonomy Evaluations (2.3) (25:56) The Alignment Risk Update Document (26:39) The Threat Model (29:18) Misalignment As Failure Mode (31:35) Wouldnt You Know? (33:40) Dont Encourage Your Model (35:14) Beware Goodharts Law (37:18) Beware The Most Forbidden Technique (5.2.3) (41:44) Asking The Right Questions (43:11) Model Organism Tests (45:01) Model Weight Security (Risk Report 5.5.2.1) (45:31) Reward Hacking (Back to The Model Card) (45:56) Remote Drop-In Worker Coming Soon (49:01) External Testing (2.3.7) (49:37) Cyber Insecurity General Principle Interlude (50:46) Alignment (4) (56:38) Risk In The Room (57:56) Mythos Meant Well (01:00:20) Risk Not In The Room (01:02:05) Alignment Testing Overview (01:05:20) Internal Deployment Testing Process (01:07:55) Reports From Pilot Use (4.2.1) (01:08:30) Reports From Automated Testing (4.2) (01:10:13) Other External Testing (01:10:56) Just The Facts, Sir (01:13:05) Refusing Safety Research (01:14:12) Claude Favoritism (01:15:19) Ruling Out Encoded Thinking (4.4.1) (01:18:41) Sandbagging (4.4.2) (01:21:27) Capability for Evasion of Safeguards (4.4.3) (01:23:04) Pick A Random Number (4.4.3.4) (01:25:49) White Box Analysis (4.5) (01:30:30) Model Welfare (5) (01:31:32) Key Model Welfare Findings (5.1.2) (01:41:17) Is Mythos Okay? (01:43:52) Self-Play (01:45:30) A Few Fun Facts --- First published: April 9th, 2026 Source: https://www.lesswrong.com/posts/EDQhwLTyTnNmaxRGq/claude-mythos-the-system-card --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h 47 min
  2. -2 J

    “AI #163: Mythos Quest” by Zvi

    There exists an AI model, Claude Mythos, that has discovered critical safety vulnerabilities in every major operating system and browser. If released today it would likely break the internet and be chaos. If they had wanted to, they could have used it themselves and owned pretty much everyone. Luckily for all of us, Anthropic did no such thing. Instead, Anthropic is launching Project Glasswing, and making Mythos available to cybersecurity companies, so everyone can patch all the world's critical software as quickly as possible, and then we can figure out what to do from there. That's the story in AI that matters this week, and it is where my focus will be until I’ve worked my way through it all. But as always, that takes time to do right. So instead, I’m getting the weekly, and coverage of everything else, out of the way a day early. This post is about the non-Mythos landscape, and I hope to start covering Mythos and Project Glasswing tomorrow. I also covered the latest extended (18k words!) article about the history of Sam Altman and OpenAI, which contained some new material while confirming much old material, and analyzed their recent [...] --- Outline: (02:17) Language Models Offer Mundane Utility (02:48) Language Models Dont Offer Mundane Utility (03:11) Huh, Upgrades (04:24) On Your Marks (06:55) Meta Problems (07:15) Fun With Media Generation (09:13) A Young Ladys Illustrated Primer (09:22) You Drive Me Crazy (22:05) Unprompted Attention (22:46) They Took Our Jobs (33:27) They Took Our Job Market (35:29) Get Involved (37:31) In Other AI News (38:08) Search Your Feelings You Know It To Be True (45:58) Actors And Scribes (49:06) Show Me the Money (53:46) Bubble, Bubble, Toil and Trouble (54:05) Quiet Speculations (54:20) Quickly, Theres No Time (58:02) More Time Would Be Better (58:55) Greetings From The Department of War (01:00:11) The Quest for Sane Regulations (01:01:57) Chip City (01:03:29) Political Violence Is Completely and Always Unacceptable (01:04:16) The Week in Audio (01:06:42) Rhetorical Innovation (01:10:53) People Really Hate AI (01:13:39) Aligning a Smarter Than Human Intelligence is Difficult (01:17:44) Messages From Janusworld (01:21:00) People Are Worried About AI Killing Everyone (01:21:50) The Lighter Side --- First published: April 8th, 2026 Source: https://www.lesswrong.com/posts/5Dsuw9gGzkbjS4ubx/ai-163-mythos-quest --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h 24 min
  3. -2 J

    “OpenAI #16: A History and a Proposal” by Zvi

    The real news today is that Anthropic has partnered with the top companies in cybersecurity to try and patch everyone's systems to fix all the thousands of zero-day exploits found by their new model Claude Mythos. I’ll be sorting through that over the coming days. For now, we instead have stories from OpenAI. In particular there are three stories. There's a massive 18,000 word article in The New Yorker about Sam Altman and the history of OpenAI as it relates to his trustworthiness. No trust. There's also OpenAI's proposal for a ‘new deal’ of sorts. No deal. Then there is an actual deal, where they bought TBPN. RIP. Table of Contents Part 1: OpenAI: The Histories. The Battle of the Board. Thanks For The Memos. I Am What I Am. That's Not What I Said. There Will Be No Investigation. Musk Versus Altman. Amodei Versus Altman. Sydney Versus Altman. Highest Bidder Versus Altman. Risky Business. Superalignment Was Always Fake. This Is Fine. Liar Liar Master Persuader. This In Particular Is Securities Fraud. Regulation Two Step. [...] --- Outline: (00:54) Part 1: OpenAI: The Histories (02:11) The Battle of the Board (03:17) Thanks For The Memos (03:39) I Am What I Am (04:21) Thats Not What I Said (04:37) There Will Be No Investigation (05:41) Musk Versus Altman (06:54) Amodei Versus Altman (08:47) Sydney Versus Altman (09:43) Highest Bidder Versus Altman (12:07) Risky Business (14:42) Superalignment Was Always Fake (17:18) This Is Fine (18:12) Liar Liar Master Persuader (22:01) This In Particular Is Securities Fraud (23:43) Regulation Two Step (25:11) Easy Mode (27:48) The Right Amount of Alignment Research Is Not Zero (29:54) OpenAI Proposes Policy (41:46) RIP TBPN --- First published: April 7th, 2026 Source: https://www.lesswrong.com/posts/QSgBhcDKi9j5iSi9s/openai-16-a-history-and-a-proposal --- Narrated by TYPE III AUDIO.

    46 min
  4. -3 J

    “Housing Roundup #13: More Dakka” by Zvi

    Build more housing where people want to live. The rest is commentary. If there is enough housing, it will be affordable, people will afford more house, and people will be able to live where they want to live. It's always been that simple. Increased supply of any kind of housing increases affordability of all kinds of housing. Are there other things that would also be helpful? Yes, but they’re commentary. Freeing up existing underused housing, for example, is helpful. It is commentary. Let's enjoy the lull and see how much of an Infrastructure Week we can do. New Levels Of Saying Quiet Part Out Loud Even For This Guy Trump opposes building houses where people want to live, because doing so would let people live there, which would drive down the value of existing homes. Acyn: Trump: I don’t want to drive housing prices down. I want to drive housing prices up for people who own their homes. You can be sure that will happen. unusual_whales: Trump: when you make it too easy and cheap to build houses, house prices come down. I don’t want to do that. [...] --- Outline: (00:48) New Levels Of Saying Quiet Part Out Loud Even For This Guy (02:30) Whose Side Are You On. (03:25) Your Intervention Only Partly Solves The Problem So We Are Against It (04:21) More Dakka (05:32) Abundance (06:44) Changes In Rent Are Largely About Changes In Supply (07:30) Austin (08:46) America (10:01) Minnesota (11:20) Debunking Obvious Nonsense About Monopolistic Practices (21:24) Age Of The Median Homebuyer (24:27) Property Taxes Improve Allocation Efficiency (27:21) More Of Old People Inefficiently And Systematically Stealing From Young People --- First published: April 6th, 2026 Source: https://www.lesswrong.com/posts/eSwdsDTnqigQJPfkw/housing-roundup-13-more-dakka --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    28 min
  5. -6 J

    “Anthropic Responsible Scaling Policy v3: Dive Into The Details” by Zvi

    Wednesday's post talked about the implications of Anthropic changing from v2.2 to v3.0 of its RSP, including that this broke promises that many people relied upon when making important decisions. Today's post treats the new RSP v3.0 as a new document, and evaluates it. First I’ll go over how the RSP v3.0 works at a high level. Then I’ll dive into the Roadmap and the Risk Report. How RSP v3.0 Works Normally I would pay closer attention to the exact written contents of the new RSP. In this case, it's not that the RSP doesn’t matter. I do think the RSP will have some influence on what Anthropic chooses to do, as will the road map, as will the resulting risk reports. However, the fundamental design principle is flexibility and a ‘strong argument,’ and they can change the contents at any time, all of which means the central principle is trust. I read the contents as ‘here are the things we are worried about and plan to do,’ which mostly in practice should amount to doing what they believe is right and I don’t see anything on this map that seems likely [...] --- Outline: (00:40) How RSP v3.0 Works (19:05) You Came Here For An Argument (21:27) The Problem Remains Unsolved (25:22) Wow That Thing We Did Was Pretty Risky, Huh? (26:18) Risk Report #1 (28:19) Listen All Yall Its Sabotage (38:05) Looking Forward (39:42) Claude Gov (40:02) What Is A Strong Argument? (41:12) Recursive Self-Improvement (42:32) Non-Novel Chemical and Biological Weapons (44:51) Novel Chemical and Biological Weapons (45:39) Cross-Cutting Content (Section 6) (48:48) Risk Report Report --- First published: April 3rd, 2026 Source: https://www.lesswrong.com/posts/RtQxa5MoKk9bwEEEd/anthropic-responsible-scaling-policy-v3-dive-into-the --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    51 min
  6. 2 AVR.

    “AI #162: Visions of Mythos” by Zvi

    Anthropic had some problem with leaks this week. We learned that they are sitting on a new larger-than-Opus AI model, Mythos, that they believe offers a step change in cyber capabilities. We also got a full leak of the source for Claude Code. Oh, and Axios was compromised, on the heels of LiteLLM. This looks to be getting a lot more common. Defense beats offense in most cases, but offense is getting a lot more shots on goal than it used to. The AI Doc: Or How I Became an Aplocayloptimist came out this week. I gave it 4.5/5 stars, and I think the world would be better off if more people saw it. I am not generally a fan of documentary movies, but this is probably my new favorite, replacing The King of Kong: A Fistful of Quarters. There was also the usual background hum of quite a lot of things happening, including the latest iterations of various debates. We may or may not be doomed to die, but we are definitely doomed to repeat certain motions quite a few more times, and for people to be rather slow to update. We got some very welcome quiet on the [...] --- Outline: (01:41) Language Models Offer Mundane Utility (03:00) Heads In The Sand (07:05) Huh, Upgrades (08:10) Mythos (12:07) Whats In A Name (14:59) On Your Marks (16:10) Choose Your Fighter (16:53) Get My Agent On The Line (17:31) Deepfaketown and Botpocalypse Soon (24:33) Cyber Lack Of Security (29:08) Fun With Media Generation (29:50) A Young Ladys Illustrated Primer (30:53) They Took Our Jobs (37:45) After They Take Our Jobs (39:16) Gell-Mann Amnesia (41:33) Get Involved (43:25) In Other AI News (46:41) Show Me the Money (51:08) Quiet Speculations (51:59) Explaining Persistent Model Parity (55:37) Take a Moment (01:00:54) OpenAI: The Histories (01:06:04) The Department of AI War (01:12:38) Department of AI Solidarity (01:13:46) Writing For The AIs (01:16:42) Quickly, Theres No Time (01:16:46) The Quest for Sane Regulations (01:18:10) Chip City (01:20:07) You Received The Federal Framework (01:21:02) The Week in Audio (01:24:22) Rhetorical Innovation (01:27:48) I Am The Very Human Of A Frontier Language Model (01:38:01) Aligning a Smarter Than Human Intelligence is Difficult (01:41:22) Aligning Fake Graphs Can Also Be Difficult (01:49:32) The Lighter Side --- First published: April 2nd, 2026 Source: https://www.lesswrong.com/posts/iBeTkFuQwjaRPo3Ad/ai-162-visions-of-mythos --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h 50 min
  7. 1 AVR.

    “Anthropic Responsible Scaling Policy v3: A Matter of Trust” by Zvi

    Anthropic has revised its Responsible Scaling Policy to v3. The changes involved include abandoning many previous commitments, including one not to move ahead if doing so would be dangerous, citing that given competition they feel blindly following such a principle would not make the world safer. Holden Karnofsky advocated for the changes. He maintains that the previous strategy of specific commitments was in error, and instead endorses the new strategy of having aspirational goals. He was not at Anthropic when the commitments were made. My response to this will be two parts. Today's post talks about considerations around Anthropic going back on its previous commitments, including asking to what extent Anthropic broke promises or benefited from people reacting to those promises, and how we should respond. It is good, given that Anthropic was not going to keep its promises, that it came out and told us that this was the case, in advance. Thank you for that. I still think that Anthropic importantly broke promises, that people relied upon, and did so in ways that made future trust and coordination, both with Anthropic and between labs and governments, harder. Admitting to the situation [...] --- Outline: (01:47) Promises, Promises (03:10) Anthropic Responsible Scaling Policy v3 (03:32) That Could Have Gone Better (04:36) Im Just Not Ready To Make a Commitment (08:20) So Cold, So Alone (12:24) Im Sorry I Gave You That Impression (19:44) Fool Me Twice (23:27) In My Defense I Was Left Unsupervised (26:01) Drake Thomas Finds The Missing Mood (28:49) Things That Could Have Been Brought To My Attention Yesterday (1) (30:32) Things That Could Have Been Brought To My Attention Yesterday (2) (36:13) What We Have Here Is A Failure To Communicate (39:21) You Should See The Other Guy (42:17) I Was Only Kidding (43:12) They Cant Keep Getting Away With This (44:07) Damn Your Sudden But Inevitable Betrayal --- First published: April 1st, 2026 Source: https://www.lesswrong.com/posts/AkzauoTt2Lwn2yAvj/anthropic-responsible-scaling-policy-v3-a-matter-of-trust --- Narrated by TYPE III AUDIO.

    47 min
  8. 31 MARS

    “Movie Review: The AI Doc” by Zvi

    The AI Doc: Or How I Became an Apocaloptimist is a brilliant piece of work. (This will be a fully spoilorific overview. If you haven’t seen The AI Doc,I recommend seeing it, it is about as good as it could realistically have been, in most ways.) Like many things, it only works because it is centrally real. The creator of the documentary clearly did get married and have a child, freak out about AI, ask questions of the right people out of worry about his son's future, freak out even more now with actual existential risk for (simplified versions of) the right reasons, go on a quest to stop freaking out and get optimistic instead, find many of the right people for that and ask good non-technical questions, get somewhat fooled, listen to mundane safety complaints, seek out and get interviews with the top CEOs, try to tell himself he could ignore all of it, then decide not to end on a bunch of hopeful babies and instead have a call for action to help shape the future. The title is correct. This is about ‘how I became an Apolcaloptimist,’ and why he wanted to be that, as opposed to [...] --- Outline: (03:37) Babies Are Awesome (04:58) People Are Worried About AI Killing Everyone (06:17) Freak Out (06:47) Other People Are Not Worried About AI Killing Everyone (09:27) Deepfaketown and Botpocalypse Soon (10:15) Stopping The AI Race and A Narrow Path (11:47) CEOs Know Their Roles (13:28) The Call To Action --- First published: March 31st, 2026 Source: https://www.lesswrong.com/posts/ppC6geY4FxGYifrWx/movie-review-the-ai-doc --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    15 min

Notes et avis

5
sur 5
2 notes

À propos

Audio narrations of LessWrong posts by zvi

Vous aimeriez peut‑être aussi