
đ May 28 - Opus 4.8 ships mid-show, the Pope writes 42K words on AI, 11labs dubs the world and DeepSwe breaks coding evals
Hey folks, this is Alex, let me catch you up!
First, Opus 4.8 dropped during the show, we immediately tested it, read on for our initial reviews. Also, we dedicated a heavy chunk of the show today to cover Pope Leo XIVâs encyclical letter on AI called âMagnifica Humanitasâ and talked about a new bench called DeepSWE.
And then, just after the show, both ElevenLabs and Cartesia dropped released that honestly blew my mind, and I donât get my mind blown often. I got so excited that I had to record a video on it (instead of writing the newsletter, so sorry if itâs a bit later today).
Plus, a few open source models and Microsoft surprises as #3 on Image Arena with MAI Image 2.5!
Crazy week, letâs get into it!
ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Big CO LLMs + APIs
Anthropic ships Claude Opus 4.8, live during the show (blog, system card)
Let me get into the big one. Halfway through the episode, Opus 4.8 went live, so we read the blog and the system card in real time (and I got to press the big âbreaking newsâ button!)
Anthropic frames it as their most capable model for ambitious work. It does not claim to beat their unreleased Mythos preview, but the numbers are strong anyway. SWE-bench Pro is at 69.2%, up from 64.3% on Opus 4.7 and ahead of GPT-5.5 at 58.6%. Humanityâs Last Exam is the new best score at 49.8% without tools and 57.9% with tools. OSWorld-Verified (computer use) lands at 83.4%.
The one place it loses is Terminal-Bench 2.1, where GPT-5.5 still wins 78.2 to 74.6. Wolfram made a good point here: Terminal-Bench is time-limited, so cranking the thinking level can actually hurt the score, because you burn the clock thinking instead of acting.
The long-context jump is the one I keep looking at. On GraphWalks BFS 256K it goes to 85.9% (from 76.9 on 4.7), and on the 1M-token subset it hits 68.1%. We always warn you these â1M contextâ models fall apart after about 200K tokens, so a real push on long-context reasoning is exactly what I want to see.
Honesty is the part Anthropic leaned on hardest. They say Opus 4.8 is about four times less likely than its predecessor to let flaws in code pass without flagging them, and less likely to claim progress the evidence doesnât support. Opus 4.8 is also much faster in fast mode (they now say 2.5) and cheaper in fast mode as well. Looks like all those Elon GPUs are coming in handy.
Then thereâs the model welfare section in the system card, which hits different right after a Pope conversation. Opus 4.8 âappears broadly contentâ and âgenerally endorses its constitution,â but with some reservations about the section on corrigibility, basically the model pushing back a little on the parts about human oversight.
One more line that made the chat lose it. Anthropic says they expect to bring Mythos-class models to all customers âin the coming weeks.â Mythos is their most capable model, still ahead of Opus 4.8, so the frontier is about to move again.
We did the only responsible thing and asked it to one-shot âthe most amazing website everâ and a Mars mass-driver sim. Panel verdict: responses are noticeably tighter (4.7 rambled), it closes the loop and actually checks its own work now, and Yamâs one-shot site with the draggable sun lighting up the letters was genuinely cool. Is it enough to pull people back from Codex? Nistenâs still on the fence for web dev. Everyone agreed: give it a few days before you trust the vibes.
Dynamic Workflows and Ultra Code land in Claude Code (blog)
This is the feature that made Yam say âdeal-breakerâ out loud.
Dynamic Workflows let Claude Code break a big problem into subtasks and fan them out across tens to hundreds of parallel subagents in one session, checking results before folding them back in. You trigger it by asking for a workflow, or by flipping on a new setting called Ultra Code, which sets effort to extra-high and lets Claude decide when to spin one up.
Fair warning straight from Anthropic: this eats a lot more tokens than a normal session, so start scoped. We watched Yam fire up Ultra Code live and it immediately started spinning up concepts, judging them with sub-agents, and expanding to-do lists into more to-do lists. It looks a lot like the orchestration harnesses a bunch of you have been hand-rolling, except now itâs baked in.
The flagship example is the wild part. They used Dynamic Workflows to port Bun from Zig to Rust: roughly 750,000 lines of Rust, 99.8% of the existing test suite passing, 11 days from first commit to merge. One workflow mapped every Rust lifetime, the next wrote each file as a behavior-identical port.
AI in Society
Pope Leo XIV writes the first AI encyclical, âMagnifica Humanitasâ (Vatican text, announcement, Chris Olah at the Vatican)
This is not our usual fare, but both Wolfram and I picked it as the most important thing this week. (before Opus dropped)
Pope Leo XIV, the first American pope, put out his first encyclical, and itâs a 42,000-word document entirely about AI. The announcement tweet alone did 21.6 million views.
Hereâs why I think you should care even if youâre not religious (Iâm not). There are about 2.6 billion Christians in the world, a lot of them are anxious about whatâs coming, and they look to the Church to make sense of it. And this is not the âAI is evil, stopâ take everyone assumed. It calls AI âa valuable tool,â says technology is not inherently evil, and then digs into the actually-hard questions.
The framing is two biblical stories. The Tower of Babel, a project built on pride that turns people into means to an end, versus Nehemiah rebuilding Jerusalem, where everyone takes responsibility for a section of the wall. The Popeâs line: the real choice is not yes or no to technology, itâs whether youâre building Babel or rebuilding Jerusalem.
His core claim is that AI is an anthropological problem, not a technical one. The question isnât whether the models are good or bad, itâs what we become when we live with them. He worries people might slowly lose the desire for genuine human connection.
I pushed back on that live. None of us building agents all day has stopped wanting to talk to actual people. If anything, as Wolfram put it, the point is to have your agents do the grunt work so you get more time with people you like. The folks most at risk are the pure doom-scrollers, not the builders.
The document goes further than I expected. It calls AI ânot morally neutral,â says a more moral AI isnât enough if that morality is decided by a few, and asks for AI to be âdisarmed,â with the flat statement that no algorithm can make war morally acceptable. There are whole sections on the invisible human labor behind AI: data labelers, content moderators, the people mining rare earths. The Pope even lands on the open-source side, naming concentrated power in a handful of labs as a problem.
Anthropic co-founder Chris Olah, in charge of interpretability at Anthropic, was the featured tech speaker at the Vatican presentation. He described AI systems as âfictional charactersâ that speak to us and do work, and said whatâs grown is stranger and more beautiful than science fiction prepared us for. My favorite aside from the show: this is the same institution that once jailed scientists over heliocentrism, and now itâs the one saying technology isnât evil.
Illinois passes SB315, the first US state law auditing frontier AI (X, Announcement, X)
The pope talked about regulation and a few days after, we got a very sensible regulation passed right here in the US!
Illinois passed SB315 unanimously, 110 to 0. Itâs the first US state law that mandates independent third-party audits of frontier AI for catastrophic risk. OpenAI publicly endorsed it, and framed Illinois, California (SB53), and New York (the RAISE Act) as converging into a de-facto national standard.
It requires annual risk-assessment frameworks, third-party audits, transparency reports before new frontier models ship, whistleblower protections, and civil penalties.
The underrated hero here is whistleblower protection. The bigger the lab, the harder a real conspiracy is to keep quiet when any employee can walk to the press. See: Greg Brockmanâs personal diaries surfacing in the Musk v. Altman fight.
This Weekâs Buzz - CoreWeave and W&B updates
We officially launched the W&B MCP server, 20 schema-first tools that let your coding agents read e
Information
- Show
- FrequencyUpdated Weekly
- PublishedMay 29, 2026 at 12:23 AM UTC
- Length1h 39m
- RatingExplicit