LessWrong posts by zvi

zvi

Audio narrations of LessWrong posts by zvi

  1. HACE 14 H

    “AI #155: Welcome to Recursive Self-Improvement” by Zvi

    This was the week of Claude Opus 4.6, and also of ChatGPT-5.3-Codex. Both leading models got substantial upgrades, although OpenAI's is confined to Codex. Once again, the frontier of AI got more advanced, especially for agentic coding but also for everything else. I spent the week so far covering Opus, with two posts devoted to the extensive model card, and then one giving benchmarks, reactions, capabilities and a synthesis, which functions as the central review. We also got GLM-5, Seedance 2.0, Claude fast mode, an app for Codex and much more. Claude fast mode means you can pay a premium to get faster replies from Opus 4.6. It's very much not cheap, but it can be worth every penny. More on that in the next agentic coding update. One of the most frustrating things about AI is the constant goalpost moving, both in terms of capability and safety. People say ‘oh [X] would be a huge deal but is a crazy sci-fi concept’ or ‘[Y] will never happen’ or ‘surely we would not be so stupid as to [Z]’ and then [X], [Y] and [Z] all happen and everyone shrugs as if nothing happened and [...] --- Outline: (02:32) Language Models Offer Mundane Utility (03:17) Language Models Dont Offer Mundane Utility (03:33) Huh, Upgrades (04:22) On Your Marks (06:23) Overcoming Bias (07:20) Choose Your Fighter (08:44) Get My Agent On The Line (12:03) AI Conversations Are Not Privileged (12:54) Fun With Media Generation (13:59) The Superb Owl (22:07) A Word From The Torment Nexus (26:33) They Took Our Jobs (35:36) The Art of the Jailbreak (35:48) Introducing (37:28) In Other AI News (42:01) Show Me the Money (43:05) Bubble, Bubble, Toil and Trouble (53:38) Future Shock (56:06) Memory Lane (57:09) Keep The Mask On Or Youre Fired (58:35) Quiet Speculations (01:03:42) The Quest for Sane Regulations (01:06:09) Chip City (01:09:46) The Week in Audio (01:10:06) Constitutional Conversation (01:11:00) Rhetorical Innovation (01:19:26) Working On It Anyway (01:22:17) The Thin Red Line (01:23:35) Aligning a Smarter Than Human Intelligence is Difficult (01:30:42) People Will Hand Over Power To The AIs (01:31:50) People Are Worried About AI Killing Everyone (01:32:40) Famous Last Words (01:40:15) Other People Are Not As Worried About AI Killing Everyone (01:42:41) The Lighter Side --- First published: February 12th, 2026 Source: https://www.lesswrong.com/posts/cytxHuLc8oHRq7sNE/ai-155-welcome-to-recursive-self-improvement --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h y 48 min
  2. HACE 18 H

    “AI #156 Part 2: Errors in Rhetoric” by Zvi

    Things that are being pushed into the future right now: Gemini 3.1 Pro and Gemini DeepThink V2. Claude Sonnet 4.6. Grok 4.20. Updates on Agentic Coding. Disagreement between Anthropic and the Department of War. We are officially a bit behind and will have to catch up next week. Even without all that, we have a second highly full plate today. Table of Contents (As a reminder: bold are my top picks, italics means highly skippable) Levels of Friction. Marginal costs of arguing are going down. The Art Of The Jailbreak. UK AISI finds a universal method. The Quest for Sane Regulations. Some relatively good proposals. People Really Hate AI. Alas, it is mostly for the wrong reasons. A Very Bad Paper. Nick Bostrom writes a highly disappointing paper. Rhetorical Innovation. The worst possible plan is the best one on the table. The Most Forbidden Technique. No, stop, come back. Everyone Is Or Should Be Confused About Morality. New levels of ‘can you?’ Aligning a Smarter Than Human Intelligence is Difficult. Seeking a good basin. [...] --- Outline: (00:43) Levels of Friction (04:55) The Art Of The Jailbreak (06:16) The Quest for Sane Regulations (12:09) People Really Hate AI (18:22) A Very Bad Paper (25:21) Rhetorical Innovation (32:35) The Most Forbidden Technique (34:10) Everyone Is Or Should Be Confused About Morality (36:07) Aligning a Smarter Than Human Intelligence is Difficult (44:51) Well Just Call It Something Else (47:18) Vulnerable World Hypothesis (51:37) Autonomous Killer Robots (53:18) People Will Hand Over Power To The AIs (57:04) People Are Worried About AI Killing Everyone (59:29) Other People Are Not Worried About AI Killing Everyone (01:00:56) The Lighter Side --- First published: February 20th, 2026 Source: https://www.lesswrong.com/posts/obqmuRxwFyy8ziPrB/ai-156-part-2-errors-in-rhetoric --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h y 4 min
  3. HACE 1 DÍA

    “AI #156 Part 1: They Do Mean The Effect On Jobs” by Zvi

    There was way too much going on this week to not split, so here we are. This first half contains all the usual first-half items, with a focus on projections of jobs and economic impacts and also timelines to the world being transformed with the associated risks of everyone dying. Quite a lot of Number Go Up, including Number Go Up A Lot Really Fast. Among the thing that this does not cover, that were important this week, we have the release of Claude Sonnet 4.6 (which is a big step over 4.5 at least for coding, but is clearly still behind Opus), Gemini DeepThink V2 (so I could have time to review the safety info), release of the inevitable Grok 4.20 (it's not what you think), as well as much rhetoric on several fronts and some new papers. Coverage of Claude Code and Cowork, OpenAI's Codex and other things AI agents continues to be a distinct series, which I’ll continue when I have an open slot. Most important was the unfortunate dispute between the Pentagon and Anthropic. The Pentagon's official position is they want sign-off from Anthropic and other AI companies on ‘all legal uses’ [...] --- Outline: (02:26) Language Models Offer Mundane Utility (02:49) Language Models Dont Offer Mundane Utility (06:11) Terms of Service (06:54) On Your Marks (07:50) Choose Your Fighter (09:19) Fun With Media Generation (12:29) Lyria (14:13) Superb Owl (14:54) A Young Ladys Illustrated Primer (15:03) Deepfaketown And Botpocalypse Soon (17:49) You Drive Me Crazy (18:04) Open Weight Models Are Unsafe And Nothing Can Fix This (21:19) They Took Our Jobs (26:53) They Kept Our Agents (27:42) The First Thing We Let AI Do (37:47) Legally Claude (40:24) Predictions Are Hard, Especially About The Future, But Not Impossible (46:08) Many Worlds (48:45) Bubble, Bubble, Toil and Trouble (49:31) A Bold Prediction (49:55) Brave New World (53:09) Augmented Reality (55:21) Quickly, Theres No Time (58:29) If Anyone Builds It, We Can Avoid Building The Other It And Not Die (01:00:18) In Other AI News (01:04:03) Introducing (01:04:31) Get Involved (01:07:15) Show Me the Money (01:08:26) The Week In Audio --- First published: February 19th, 2026 Source: https://www.lesswrong.com/posts/jcAombEXyatqGhYeX/ai-156-part-1-they-do-mean-the-effect-on-jobs --- Narrated by TYPE III AUDIO. --- Images from the article: ω a scene where two people discuss how to pronounce "fofr"". Below the tweet is a 5-second video showing a woman with long brown hair smiling in a modern living room setting." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h y 9 min
  4. HACE 2 DÍAS

    “Monthly Roundup #39: February 2026” by Zvi

    There really is a lot going on these days. I held off posting this because I was trying to see if I could write a net helpful post about the current situation involving Anthropic and the Pentagon. Anthropic very much wants to help DoW defend our country and make us strong. It is clear there have been some large misunderstandings here about how LLMs work. They are not ordinary tools like spreadsheets that automatically do whatever the user asks, nor would it be safe to make them so, nor do they predictably adhere to written rule sets or take instructions from their CEO in a crisis. And they are probabilistic. You do not and cannot get absolute guarantees. The only way to know if an AI model will do what you need in a crisis is something you needed to be do regardless of potential refusals, and which is also what you must do with human soldiers, which is to run the simulations and mock battles and drills and tests that tell you if the model can do and is willing to do the job. If there are irreconcilable differences and the military contract needs [...] --- Outline: (02:09) Bad News (04:03) Government Working (15:11) The Epstein Files (17:17) RIP Scott Adams (19:08) News You Cant Use But Click On Anyway (20:22) Were Putting Together A Team (20:47) You Cant Retire, I Quit (23:41) Jones Act Watch (33:24) Variously Effective Altruism (34:41) They Took Our Jobs And Now I Can Relax (35:39) While I Cannot Condone This (37:32) Good News, Everyone (39:19) Use Your One Time (42:34) Hands Off My Phone (45:14) Fun Theory (46:46) Good Advice (48:31) For Your Entertainment (51:07) Plur1bus (52:14) Gamers Gonna Game Game Game Game Game (57:58) Sports Go Sports (01:02:49) The Revolution of Retroactive Rising Expectations (01:04:15) I Was Promised Spying Cars (01:05:13) Prediction Market Madness (01:09:02) The Lighter Side --- First published: February 18th, 2026 Source: https://www.lesswrong.com/posts/3QPoEGfzHaywGDWKr/monthly-roundup-39-february-2026 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h y 17 min
  5. HACE 3 DÍAS

    “On Dwarkesh Patel’s 2026 Podcast With Elon Musk and Other Recent Elon Musk Things” by Zvi

    Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level. This was one of those. So here we go. As usual for podcast posts, the baseline bullet points describe key points made, and then the nested statements are my commentary. Some points are dropped. If I am quoting directly I use quote marks, otherwise assume paraphrases. Normally I keep everything to numbered lists, but in several cases here it was more of a ‘he didn’t just say what I think he did did he’ and I needed extensive quotes. In addition to the podcast, there were some discussions around safety, or the lack thereof, at xAI, and Elon Musk went on what one can only describe as megatilt, including going hard after Anthropic's Amanda Askell. I will include that as a postscript. I will not include recent developments regarding Twitter, since that didn’t come up in the interview. I lead with a discussion of bounded distrust and how to epistemically consider Elon Musk, since that will be important throughout including in the postscript. What are the key takeaways? Elon Musk is more confused than [...] --- Outline: (02:56) Bounded Distrust (05:12) IN SPACE (09:56) The AI Will Follow You To Mars (22:32) xAI Business Plans (25:54) Optimus Prime (27:04) Beating China (30:02) SpaceX and How To Run a Company Elon Style (33:17) DOGE (35:29) TeraFab IN SPACE (35:47) Postscript: Safety Third at xAI (40:15) Elon Serves Back Saying That Which Is Not (42:51) Elons Army (43:55) Children Are Our Future (48:11) Where Do We Go From Here --- First published: February 17th, 2026 Source: https://www.lesswrong.com/posts/5yidbWsdWjNzWzLWZ/on-dwarkesh-patel-s-2026-podcast-with-elon-musk-and-other --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    49 min
  6. HACE 4 DÍAS

    “On Dwarkesh Patel’s 2026 Podcast With Dario Amodei” by Zvi

    Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level. This was very clearly one of those. So here we go. As usual for podcast posts, the baseline bullet points describe key points made, and then the nested statements are my commentary. Some points are dropped. If I am quoting directly I use quote marks, otherwise assume paraphrases. What are the main takeaways? Dario mostly stands by his predictions of extremely rapid advances in AI capabilities, both in coding and in general, and in expecting the ‘geniuses in a data center’ to show up within a few years, possibly even this year. Anthropic's actions do not seem to fully reflect this optimism, but also when things are growing on a 10x per year exponential if you overextend you die, so being somewhat conservative with investment is necessary unless you are prepared to fully burn your boats. Dario reiterated his stances on China, export controls, democracy, AI policy. The interview downplayed catastrophic and existential risk, including relative to other risks, although it was mentioned and Dario remains concerned. There was essentially no talk about alignment [...] --- Outline: (01:47) The Pace of Progress (08:56) Continual Learning (13:46) Does Not Compute (15:29) Step Two (22:58) The Quest For Sane Regulations (26:08) Beating China --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/jWCy6owAmqLv5BB8q/on-dwarkesh-patel-s-2026-podcast-with-dario-amodei --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    29 min
  7. 13 FEB

    “ChatGPT-5.3-Codex Is Also Good At Coding” by Zvi

    OpenAI is back with a new Codex model, released the same day as Claude Opus 4.6. The headline pitch is it combines the coding skills of GPT-5.2-Codex with the general knowledge and skills of other models, along with extra speed and improvements in the Codex harness, so that it can now handle your full stack agentic needs. We also got the Codex app for Mac, which is getting positive reactions, and quickly picked up a million downloads. CPT-5.3-Codex is only available inside Codex. It is not in the API. As usual, Anthropic's release was understated, basically a ‘here's Opus 4.6, a 212-page system card and a lot of benchmarks, it's a good model, sir, so have fun.’ Whereas OpenAI gave us a lot less words and a lot less benchmarks, while claiming their model was definitely the best. OpenAI: GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex while [...] --- Outline: (01:50) The Overall Picture (03:00) Quickly, Theres No Time (04:15) System Card (04:49) AI Box Experiment (05:22) Maybe Cool It With Rm (07:02) Preparedness Framework (11:14) Glass Houses (12:16) OpenAI Appears To Have Violated SB 53 In a Meaningful Way (14:29) Safeguards They Did Implement (16:55) Misalignment Risks and Internal Deployment (18:38) The Official Pitch (24:28) Inception (26:12) Turn The Beat Around (27:35) Codex Does Cool Things (29:33) Positive Reactions (38:03) Negative Reactions (40:43) Codex of Ultimate Vibing --- First published: February 13th, 2026 Source: https://www.lesswrong.com/posts/CCDRjL7NZtNGtGheY/chatgpt-5-3-codex-is-also-good-at-coding --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    42 min
  8. 11 FEB

    “Claude Opus 4.6 Escalates Things Quickly” by Zvi

    Life comes at you increasingly fast. Two months after Claude Opus 4.5 we get a substantial upgrade in Claude Opus 4.6. The same day, we got GPT-5.3-Codex. That used to be something we’d call remarkably fast. It's probably the new normal, until things get even faster than that. Welcome to recursive self-improvement. Before those releases, I was using Claude Opus 4.5 and Claude Code for essentially everything interesting, and only using GPT-5.2 and Gemini to fill in the gaps or for narrow specific uses. GPT-5.3-Codex is restricted to Codex, so this means that for other purposes Anthropic and Claude have only extended the lead. This is the first time in a while that a model got upgraded while it was still my clear daily driver. Claude also pulled out several other advances to their ecosystem, including fast mode, and expanding Cowork to Windows, while OpenAI gave us an app for Codex. For fully agentic coding, GPT-5.3-Codex and Claude Opus 4.6 both look like substantial upgrades. Both sides claim they’re better, as you would expect. If you’re serious about your coding and have hard problems, you should try out both, and see what combination works [...] --- Outline: (01:55) On Your Marks (17:35) Official Pitches (17:56) It Compiles (21:42) It Exploits (22:45) It Lets You Catch Them All (23:16) It Does Not Get Eaten By A Grue (24:10) It Is Overeager (25:24) It Builds Things (27:58) Pro Mode (28:24) Reactions (28:36) Positive Reactions (42:12) Negative Reactions (50:40) Personality Changes (56:28) On Writing (59:11) They Banned Prefilling (01:00:27) A Note On System Cards In General (01:01:34) Listen All Yall Its Sabotage (01:05:00) The Codex of Competition (01:06:22) The Niche of Gemini (01:07:55) Choose Your Fighter (01:12:17) Accelerando --- First published: February 11th, 2026 Source: https://www.lesswrong.com/posts/5JNjHNn3DyxaGbv8B/claude-opus-4-6-escalates-things-quickly --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1 h y 14 min

Calificaciones y reseñas

5
de 5
2 calificaciones

Acerca de

Audio narrations of LessWrong posts by zvi

También te podría interesar