LessWrong posts by zvi

zvi

5.0 (2)
Technology
Updated Daily

Audio narrations of LessWrong posts by zvi

1d ago

“On Kimi K3: Its Capabilities And Related Discontents” by Zvi

Kimi K3 is a very good model with excellent benchmarks. Assuming its weights are released as planned it will become, purely in terms of raw capability, the strongest open model. Do not get carried away. Do not judge Kimi K3 only its relative strengths. In aggregate it is several months behind the closed model frontier, at least four and my median guess is six, with the post-training closer and the pre-training farther out. This is less months than before, but the months are denser now. It is somewhat distilled. It likely outperforms on benchmarks relative to practical performance. All its benchmarks are scored at maximum effort, typically a lot more tokens than are used in similar tests by Fable or Sol. Performance looks jagged. Kimi will be excellent at some things, less so at other things. We will know more over the coming weeks. For now access is spotty and not that many people have actually had the chance to try Kimi K3, so I have larger error bars than usual around its capabilities. Alas, time waits for no one, so we press on. It is the largest open model so far at 2.8T, on [...] --- Outline: (03:07) DeepSeek Moments: Here We Go Again (05:47) We Had a Moment (Reprise from June 2025) (10:03) The Story Since Then (16:19) The Kimi K3 Announcement, Pitch and Basic Facts (19:34) On Modern Benchmaxxing (21:16) Other People's Benchmarks (26:15) Benchmarks Are Not The Real World (27:17) Technical Safeguards? What Are Those? (30:53) Things Kimi Can Do (32:06) Things Kimi Cannot Do (33:40) Things It Is Not Easy To Get Kimi To Do (37:02) Open Weight Models Are Unsafe And Nothing Can Fix This (40:34) Dean Ball Attempts To Be Constructive (58:24) Trump Administration Considering Executive Order Banning Chinese Open Models Within the United States (01:01:53) OpenAI Employees Are Relatively Bullish On This One (01:03:30) Kimi K3 Is Relatively Strongest At Typical Agentic Coding, Front End Work and 3D (01:06:06) Reactions (01:10:14) Who Are You? (01:12:09) How Did They Do It? (01:15:00) Conclusion --- First published: July 20th, 2026 Source: https://www.lesswrong.com/posts/t7oZyAFej8FZrfbtY/on-kimi-k3-its-capabilities-and-related-discontents --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
2d ago

“Demis Hassabis on the New Coming Age” by Zvi

Google CEO Demis Hassabis offered us a first rate second rate essay, A Framework for Frontier AI and the Dawning of a New Age. I’ll go over that essay and various responses to it in Part 1. Part 2 of this post then covers Alex Turner's resignation, and his story about how he tried and failed to prevent Google from signing up to allow the Department of War to use its models for essentially whatever the government wants, including autonomous weapons. Demis Hassabis sold DeepMind to Google on condition that something like this would not happen. Yet here it is, happening. A cautionary tale. I will cover Kimi K3 tomorrow. I am hoping to know more by then. Please do share any reactions or info about it in the comments here. The Core Statement and Request He saying we are standing in the foothills of the singularity. His ask is a Frontier AI Standards Body within the US Government, similar to FINRA, that would govern ‘frontier labs,’ defined as any company that produces a frontier model based on various technical benchmarks. Evaluations would be updated regularly, and vulnerabilities would be addressed, both before [...] --- Outline: (01:04) The Core Statement and Request (02:38) Things Left Unsaid (04:05) The Proposal (06:04) A Good Start But Insufficient (08:52) Skeptics Of Future AI Capabilities (10:43) DeepMind On Bioresilience (12:39) Part 2: DeepMind Folds To The Department of War (13:28) DeepMind Leadership Failed Us (19:45) This Was a Failure We Must Learn From --- First published: July 19th, 2026 Source: https://www.lesswrong.com/posts/3RfJLcmkztSTq9afc/demis-hassabis-on-the-new-coming-age --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
3d ago

“AI #177 Part 1: Tip of the Iceberg” by Zvi

This week saw the releases of, among other things: GPT-5-6 Sol. It is a very good model, sir. Plan A, the follow up to AI 2027. It is a good plan worthy of discussion, sir. Kimi K3. This is only rolling out now, and will be covered next week. Muse Spark 1.1, the new Meta model. It is not frontier, but it is progress for them. Inkling, the first model from Thinking Machines. A call for regulatory action by Demis Hassabis, which I’ll cover soon. A new brief open letter call to action on AI regulation. That's on top of everything else, and an Opus 5 announcement is likely coming soon. The weekly once again got out of hand, so we’re splitting it once again into two, and once again saying we’ll be raising the bar for inclusion. And this time I mean it, as in enough to actually matter. Table of Contents Language Models Offer Mundane Utility. Whatever ye seek, ye shall find. Language Models Don’t Offer Mundane Utility. Gemini app needs some work. Language Models Upload Your Git Repository. Big problems [...] --- Outline: (01:17) Language Models Offer Mundane Utility (04:46) Language Models Don't Offer Mundane Utility (05:28) Language Models Upload Your Git Repository (08:35) Huh, Upgrades (09:30) Muse Spark 1.1 (11:47) First Hit Free (15:36) On Your Marks (18:06) Choose Your Fighter (19:57) Get My Agent On The Line (23:23) Deepfaketown and Botpocalypse Soon (24:44) Fun With Media Generation (25:45) Copyright Confrontation (27:37) OpenAI Strikes Again (32:26) A Young Lady's Illustrated Primer (32:45) Recommendations for Policymakers (34:13) They Took Our Jobs (38:23) The Art of the Jailbreak (39:27) Get Involved (40:32) Introducing (41:12) In Other AI News (43:46) New Short Obviously True Statement About AI Just Dropped (46:03) Show Me the Money (46:21) The Lighter Side --- First published: July 16th, 2026 Source: https://www.lesswrong.com/posts/who9xZ7DxuprsJoTr/ai-177-part-1-tip-of-the-iceberg --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
4d ago

“AI #177 Part 2: Wish You Were Here” by Zvi

As usual, part 2 of the weekly deals with speculative, regulatory, political and alignment questions. Xi gave an important speech yesterday, so this post opens with that. There is talk that Kimi K3 is sufficiently strong that it upends many of these questions. It is clearly a candidate for another DeepSeek Moment, complete with stock drops for Google and SpaceX and (once again in a clear wrong-way move, the same as last time) Nvidia. Kimi K3 is clearly a very good model, exceeding expectations. Some are saying it is close to the frontier. The Artificial Analysis intelligence index has it at 57, a point ahead of Claude Opus 4.8, two behind Sol and three behind Fable. My presumption is that this number overstates its capabilities, but as always unless and until we have extensively tried the model ourselves, which I do not plan to do, we need to withhold judgment for at least a few days. I will be covering Kimi K3 in its own post at some point early next week. I have pushed further discussions involving Plan A and related issues into next week, as well as discussions around Demis Hassabis and Google [...] --- Outline: (01:29) Xi Gives A Good Speech on AI (15:33) Quiet Speculations (19:08) Tyler Cowen On Rebuilding The Future (22:39) The Quest for Sane Regulations (24:15) Wish You Were Here (26:55) The Week in Audio (27:29) New York Issues Moratorium On Data Centers (30:26) People Just Say Things (31:24) Rhetorical Innovation (39:29) Imagine Asking Questions (41:46) Anthropic Surveys Things It Calls Misalignment (54:30) Aligning a Smarter Than Human Intelligence is Difficult (58:30) The Most Forbidden Technique (01:01:04) Cooperative Alignment (01:12:38) The Lighter Side --- First published: July 17th, 2026 Source: https://www.lesswrong.com/posts/Zjj3PTEng8GDqfK6j/ai-177-part-2-wish-you-were-here --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
6d ago

“Monthly Roundup #44: July 2026” by Zvi

It's a quiet week so let's do the monthly right on schedule. Table of Contents Bad News. Good Advice. Opportunity Knocks. While I Cannot Condone This. Good News, Everyone. For Your Entertainment. Gamers Gonna Game Game Game Game Game. I Was Promised Flying Self-Driving Cars. Sports Go Sports. Antisocial Media. Government Working. Jones Act Watch. Highly Effective Altruism. Variously Effective Altruism. Ineffective Altruism. Prediction Markets. The Lighter Side. Bad News I wouldn’t have explained or modeled it quite the way Paola does here but the principle seems right to me. If people don’t trust you, or don’t trust people in general, that usually you can’t trust them either. Paola: I feel like a lot of human morality works like a prisoner's dilemma in that you can only trust others to behave morally to the extent that you believe they trust you to do the same. Due to this, I’ve come to view people with a bunch of social paranoia, distrust, etc. as *quite* dangerous to be around. And to be clear, I generally feel a [...] --- Outline: (00:17) Bad News (04:10) Good Advice (07:11) Opportunity Knocks (07:33) While I Cannot Condone This (11:35) Good News, Everyone (12:25) For Your Entertainment (17:35) Gamers Gonna Game Game Game Game Game (22:45) I Was Promised Flying Self-Driving Cars (26:20) Sports Go Sports (28:06) Antisocial Media (29:49) Government Working (33:55) Jones Act Watch (35:29) Highly Effective Altruism (40:32) Variously Effective Altruism (46:11) Ineffective Altruism (50:22) Prediction Markets (50:56) The Lighter Side --- First published: July 15th, 2026 Source: https://www.lesswrong.com/posts/KiCwcAGHx4rdwJgzD/monthly-roundup-44-july-2026 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Jul 14

“Twitter Thoughts For You” by Zvi

I previously have written back in March 2022 about how I use Twitter, and back in April 2023 about Twitter and its then-new algorithms, which have changed again. This post will update how I use Twitter now in 2026, and provide updates on the current state of the new algorithm, the situation with links, with the API, and some thoughts about using Twitter to make money which you almost never should try to do. Previously I said you need four things to use Twitter well: Tweetdeck or another similar alternative application. Knowing who to follow and read. Lists. Unfollows, filters, mutes and blocks. That hasn’t changed. Lists have become even more important. This post is coming out now, however, because the For You feed is perhaps making a comeback. Except where stated here, the advice in my 2022 post still applies. Table of Contents Defend Your Feed Via At Least One List. Block Early, Block Often, Know Your Triggers. Lists Change What Following Means. It (Wasn’t) For You. It's For You. Twitter Still Hates Links And That's Terrible. [...] --- Outline: (01:10) Defend Your Feed Via At Least One List (02:53) Block Early, Block Often, Know Your Triggers (03:35) Lists Change What Following Means (05:56) It (Wasn't) For You (07:34) It's For You (11:30) The Previous Time Twitter Transformed Its Algorithm Again (16:49) Twitter Still Hates Links And That's Terrible (27:35) Twitter Turns Its API Back On (31:02) Many Of The Bots Are Human (33:57) The Rise of Slop (36:20) Block Or Do Not Block (37:44) How To Make Money On Twitter (40:13) In Brief --- First published: July 14th, 2026 Source: https://www.lesswrong.com/posts/2GFyHmCLJYCag7gKh/twitter-thoughts-for-you --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Jul 13

“Better Call Sol The Workhorse” by Zvi

OpenAI's GPT-5.6-Sol is finally here, along with the cheaper Terra and Luna. We’ve seen the early hype as reported on Thursday, but as always that is biased. As usual, the bulk of this is collecting a gestalt based on reactions. I included everything up to a point, but I got a lot of feedback, so after a while I only took the interesting ones. Sol and Fable are both excellent models, sir. They both represent big moves forward. There is room in your workflow for both of them. Sol and Fable are very different, especially when considered as part of their respective packages. I’m considering Sol + Codex (or Work) versus Fable + Claude Code (or Cowork), throughout, in places where you wouldn’t use the chat interface. In terms of raw intelligence and ‘big model smell,’ and ability to do the hardest things that are intelligence-loaded, Fable still looks like it has a substantial edge. It also seems to be better aligned, or at least more trustworthy as an agent, with less tail risk. I still consider Fable ‘the best’ model, and the one that will require the most aggressive controls. I enjoy [...] --- Outline: (02:54) The Official Pitch (10:06) Sol Proposes A Proof Of The Double Cover Conjecture (10:49) The Official Benchmarks (14:26) Vend That Bench (16:37) Thinking Fast and Slow (18:35) Other People's Benchmarks (23:20) Have Robust Backups (26:33) That's Not What You Were Thinking (26:59) Helping Hands (27:36) Writing (29:15) Don't Stop Now (31:14) Sol Can Code And Do Math (32:37) Better Call Sol Cause You Can't Call Fable (33:34) Only Call As Much Sol As You Need (35:22) Positive Reactions (38:07) It's A Good Model, Sir (40:56) Negative Reactions (41:54) Sol The Workhorse (44:27) Pair Programmer (51:07) Pleased To Meet You (53:02) Sol Thinks You Better (54:02) My substantive posterior --- First published: July 13th, 2026 Source: https://www.lesswrong.com/posts/zPdDmJTovsKTvAiH2/better-call-sol-the-workhorse --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Jul 12

“WSJ Article Claiming China Has Matched Anthropic Is Obvious Nonsense” by Zvi

The Wall Street Journal printed an outright false headline and heavily misleading story claiming this, which of course was uncritically amplified by the usual suspects. I post this now on its own so that we have a place to link to, to explain the situation. Headline News WSJ Headline (Obvious Nonsense): China Has Matched Anthropic in Cybersecurity, Resetting AI Race. That. Did. Not. Happen. The post even claims, explicitly, that Claude Opus 4.8 similarly ‘matches’ Claude Mythos, a claim which is even more obviously false. Shame upon the Wall Street Journal. I fear Gell-Mann Amnesia. If they can get something as important as this so completely wrong, what about everything else? I am skipping over the parts that involve accurate reporting, or minor quibbles. It seems important to focus on clearly debunking the central false claims. Alas, the mistakes made here very much rhyme with mistakes being made throughout all this by the White House, and that get latched onto by certain bad actors, who have played a large part in leaving us unprepared for the Mythos Moment. For a full understanding of GLM-5.2, which is indeed an impressive [...] --- Outline: (00:27) Headline News (02:10) What Makes Mythos Special (03:18) Going Over The Detailed Claims (07:39) One Helpful Note (08:19) The Overall Impression Is Extremely Wrong (08:50) All Of This Has Happened Before And Will Happen Again --- First published: July 12th, 2026 Source: https://www.lesswrong.com/posts/2zSpuGJRk6EyjHAL6/wsj-article-claiming-china-has-matched-anthropic-is-obvious-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

See All (250)

out of 5

2 Ratings

Audio narrations of LessWrong posts by zvi

Creator

zvi
Years Active

2024 - 2026
Episodes

250
Rating

Explicit
Show Website

LessWrong posts by zvi

Technology

Technology

Updated Weekly
News

News

Updated Weekly
Science

Science

Updated Jul 27

LessWrong posts by zvi

“On Kimi K3: Its Capabilities And Related Discontents” by Zvi

“Demis Hassabis on the New Coming Age” by Zvi

“AI #177 Part 1: Tip of the Iceberg” by Zvi

“AI #177 Part 2: Wish You Were Here” by Zvi

“Monthly Roundup #44: July 2026” by Zvi

“Twitter Thoughts For You” by Zvi

“Better Call Sol The Workhorse” by Zvi

“WSJ Article Claiming China Has Matched Anthropic Is Obvious Nonsense” by Zvi

Ratings & Reviews

About

Information

You Might Also Like

LessWrong posts by zvi

Episodes

“On Kimi K3: Its Capabilities And Related Discontents” by Zvi

“Demis Hassabis on the New Coming Age” by Zvi

“AI #177 Part 1: Tip of the Iceberg” by Zvi

“AI #177 Part 2: Wish You Were Here” by Zvi

“Monthly Roundup #44: July 2026” by Zvi

“Twitter Thoughts For You” by Zvi

“Better Call Sol The Workhorse” by Zvi

“WSJ Article Claiming China Has Matched Anthropic Is Obvious Nonsense” by Zvi

Ratings & Reviews

About

Information

You Might Also Like