Deep Dive

Deep Dive

How LLM inference actually works. Why the Strait of Hormuz could move oil prices 40 percent. What happens when AI starts automating AI research. Each episode picks one topic — usually tech, AI, or geopolitics — and goes deep. 30+ primary sources, every claim confidence-tagged, ~18 minutes per topic. For listeners tired of takes without numbers. Also on YouTube: youtube.com/@DeepDiveAIShow

  1. Platform Engineering at AI-Native Companies: What's Actually Different

    قبل ٢١ ساعة

    Platform Engineering at AI-Native Companies: What's Actually Different

    Meta's Llama 3 training run, 405 billion parameters, used 16,384 H100 GPUs for 54 days. Over those 54 days, the cluster experienced 419 unexpected interruptions — roughly one failure every three hours. And that's the run Meta calls a success. They hit 90 percent effective training time. This is the substrate platform engineers at AI-native companies are operating on. This episode is what's actually different about platform engineering at companies like OpenAI and Anthropic, compared to the traditional shape — Stripe, Netflix, Block, Google. Engineering tone, not hype. The verified primary-source view: OpenAI's two Kubernetes scaling posts at 2,500 and 7,500 nodes (5 API servers, 5 etcd, 70 GB heap per API server, 200,000 IPs in use at peak, MPI gang scheduling via the Coscheduling plugin). OpenAI's Postgres scaled for 800 million ChatGPT users on a single primary plus 50 read replicas. Anthropic's September 2025 postmortem disclosing three serving platforms (first-party, Bedrock, Vertex), three hardware backends (Trainium, NVIDIA, TPU), sticky routing, tens of chips per request. The compute portfolios: Anthropic with roughly 7 gigawatts disclosed across AWS Project Rainier (~500K Trainium2 chips), Google plus Broadcom (up to 1M TPUs), Microsoft-NVIDIA ($30B / 1 GW Grace Blackwell + Vera Rubin), and SpaceX Colossus 1 (220K NVIDIA GPUs / 300 MW). OpenAI's Stargate at $500B / 10 GW. The new problem classes: training cluster reliability (Meta cluster MTTF goes from 47.7 days at 8 GPUs to 14 minutes at 131,072 GPUs — reliability collapses non-linearly). NCCL collectives. Gang scheduling primitives (Kueue versus Volcano, properly distinguished). Inference at p99 (PagedAttention, RadixAttention, continuous batching — three independent optimizations). Prefill versus decode disaggregation. Heterogeneous fleets across H100, H200, B200, GB200, Trainium2, TPU v5p, Ironwood. HBM and U.S. energy as the binding constraints, not GPU FLOPS. What stays the same: the reliability discipline. SLOs, error budgets, on-call, blameless postmortems, observability. Anthropic's September 2025 postmortem reads like a Google SRE Book chapter. What doesn't transfer: substrate-specific tooling. You can't canary a 16,000-GPU job mid-flight. Three platforms inside one company. Training is a batch-scheduler problem. Inference is a request/response problem. Agents are a durable-workflow problem. Above all three, a chip-portability layer. Same craft. Different physics. CHAPTERS 00:00 Cold open — Llama 3.1 reliability data 00:33 Intro 00:59 The traditional platform charter 02:24 What's disclosed at OpenAI + Anthropic 04:40 Anthropic infrastructure deep dive 07:10 Team structure (OpenAI by workload, Anthropic by portability) 07:48 The new problem classes 08:20 Training cluster reliability + Meta MTTF curve 09:52 Gang scheduling — Kueue vs Volcano 10:26 Training frameworks — DeepSpeed, FSDP, Megatron 11:15 Inference at p99 — PagedAttention, RadixAttention 11:58 Prefill vs decode disaggregation 12:38 Heterogeneous fleets 13:14 Capacity planning + HBM as the binding constraint 14:28 What stays the same 15:46 Why "more load-bearing" 16:59 Closing thesis SOURCES OpenAI Kubernetes posts (2018, 2021) + Postgres scaling Anthropic September 2025 postmortem Anthropic Managed Agents + Code Execution with MCP AWS Project Rainier, Google-Broadcom, MS-NVIDIA, SpaceX Colossus OpenAI Stargate (Jan 2025) Llama 3.1 paper + Meta cluster MTTF (arXiv 2410.21680) DeepSeek V3 paper · vLLM PagedAttention (SOSP 2023) · SGLang RadixAttention Latent Space — NVIDIA Dynamo team (prefill/decode disaggregation) Google Borg paper · Netflix Tech Blog (Spinnaker, Atlas, Eureka)

    ١٨ د
  2. How LLMs Got 3× Faster Without Getting Smarter: Speculative Decoding, Explained

    قبل يوم واحد

    How LLMs Got 3× Faster Without Getting Smarter: Speculative Decoding, Explained

    Two language models running side by side are faster than one. A 60-million-parameter model drafting tokens for an 11-billion-parameter model gave Google a 2-to-3× speedup with mathematically guaranteed identical output. The smaller model is wrong about a third of the time. The bigger model only verifies in parallel. And somehow you come out ahead. That's speculative decoding. The original paper landed the same day as ChatGPT — November 30, 2022. Today it runs inside Google Search, vLLM, TensorRT, and every major LLM serving stack on the planet. This episode is the sequel to "How LLM Inference Actually Works." The mechanism. The four-line proof that says you cannot lose quality, ever. The Leviathan formula — three numbers (acceptance rate, draft length, cost ratio) that determine the speedup. Plug them in and you get the answer. The architecture progression: small-LLM drafts (2022) → MEDUSA (2024, prediction heads on the target) → EAGLE (2024, predict feature vectors) → EAGLE-3 (2025, multi-layer feature fusion, 3.0-6.5×) → Lookahead Decoding (no draft model at all). Block Verification (ICLR 2025) — the original inventor still evolving the algorithm. The honest production reality. Research papers say 5-6×. vLLM at production concurrency reports 1.2 to 2.5×. The Red Hat gpt-oss-120B benchmark hits +9.5 to 20.7 percent throughput improvement, not 3×. Acceptance rate below 0.55 turns the technique net-negative. Math at 0.518 actively hurts; code above 0.8 hits 6×+. Two case studies: Cursor's 13× speedup from using the file you're editing as the draft (not a draft model, structural prior). Morph Fast Apply at 10,500 tokens per second on a 7B model. The whole AI-code-editor category runs on this trick. MagicDec — counterintuitive long-context exception where speculative decoding helps MORE at larger batch. Five testable predictions. Closing thesis: two LLMs running together are faster than one. The math is as old as ChatGPT itself. And it is the reason your AI is faster every six months. CHAPTERS 00:00 Cold open — Two LLMs faster than one 01:10 EP2 recap — memory-bound inference 02:04 The mechanism — draft + verify 04:20 The four-line proof — why it's lossless 06:03 The Leviathan formula 07:26 Architecture progression: small-LLM → MEDUSA → EAGLE → EAGLE-3 → Lookahead 09:33 Block Verification (ICLR 2025) 10:07 Production reality — research vs serving 11:15 SpecDecode-Bench falloff + MagicDec exception 12:41 The α=0.55 floor + domain spread 13:12 Cursor 13× (file-as-draft) + Morph 10,500 tps 14:28 What spec decoding enabled (Realtime Voice, AI-code-editor) 14:59 The frontier — SSD, DFlash, speculative cascades 16:30 Five predictions 17:49 Closing thesis SOURCES Nov 30 2022 — Leviathan, Kalman, Matias (Google) "Fast Inference from Transformers via Speculative Decoding" Feb 2 2023 — Chen et al. (DeepMind) "Accelerating LLM Decoding with Speculative Sampling" Jan 2024 — MEDUSA paper (multiple decoding heads) Jan 2024 — EAGLE paper (feature-level autoregression) Mar 2025 — EAGLE-3 (NeurIPS 2025, multi-layer feature fusion) Nov 2023 — Lookahead Decoding (LMSYS / Hao AI Lab) ICLR 2025 — Block Verification (Leviathan co-authored) Aug 2024 — MagicDec long-context paper ICLR 2026 — Speculative Speculative Decoding Dec 2025 — Google DFlash (block-diffusion on TPU v5p) Apr 2026 — Red Hat gpt-oss-120B production benchmark on H200 Oct 2024 — vLLM speculative decoding blog (2.8× CNN/DailyMail at QPS=1) May 2024 — Cursor "Editing files at 1000 tokens/sec" Apr 2026 — Anthropic "Code execution with MCP" (98.7% token reduction) Berkeley EECS-2025-224 — Liu, "Efficient LLM System with Speculative Decoding" Google Research 2025 — "Looking back at speculative decoding"

    ١٩ د
  3. The Regulation Anthropic Asked For: How a Withheld Model Triggered Trump's AI Pre-Release Vetting EO

    قبل يومين

    The Regulation Anthropic Asked For: How a Withheld Model Triggered Trump's AI Pre-Release Vetting EO

    April 29, 2026: the Trump White House drafts an executive order to bring Anthropic back for federal use. May 4: the same White House is now considering a separate executive order — mandatory pre-release government vetting of frontier AI models, routed through NSA, ONCD, and DNI. Five days. The catalyst, every outlet agreed, was Mythos — the model Anthropic withheld in April. Anthropic spent three years asking for AI regulation. They got it. From the administration that doesn't trust them. This episode is what that paradox means. The lobbying record is real. Anthropic publicly endorsed SB 1047. Detailed support on the company blog, August 2024. Lobbying disclosures show systematic engagement with the Biden EO, NIST AI RMF, and AISI's voluntary testing framework. The advocacy was for a specific kind of regulation — public-facing, NIST-administered, voluntary, transparent. What's getting drafted is a different thing. NSA evaluations are classified. The Office of the National Cyber Director is not a research lab. Mandatory pre-release vetting routed through national security agencies inverts the accountability surface from public-and-adversarial to classified-and-deferential. On the UK AI Safety Institute's capture-the-flag cyber benchmark, Mythos hit 73 percent — up from Opus 4.6 at 16 percent. RSP v3.0 dropped cyber operations from the formal Responsible Scaling framework five weeks before Mythos's preview. The withholding decision was a real product call against a real capability surface — and the catalyst for an EO Anthropic almost certainly didn't want. Five testable predictions. Closing thesis: the regulation Anthropic asked for got built. The administration that doesn't trust Anthropic is now writing it. CHAPTERS 00:00 Cold open — April 7, April 29, May 4 00:27 The Trump pre-release vetting EO 00:59 Recap for returning listeners 01:35 Anthropic asked for regulation. They got it. 02:24 Mythos capabilities — UK AISI cyber benchmark 03:29 Anthropic's lobbying history (SB 1047, AI EO, AISI, NIST) 04:48 RSP v3.0 — dropping cyber ops five weeks before Mythos 07:53 Pre-release vetting — what classified evals change 09:27 National-security-flavored regulation vs civilian framework 11:37 First-access vs blocking — why the design matters 14:11 Five predictions 15:16 Closing thesis SOURCES May 4 — NYT broke story (Trump considering mandatory pre-release vetting EO) May 4-5 — Axios, Bloomberg, Tom's Hardware, USNews, MSN syndication April 29 — Draft EO to bring Anthropic back for federal use April 27 — Dean Ball, WBUR On Point + Techdirt follow-up April 8 — Project Glasswing launch April 7 — Anthropic Mythos Preview announcement February 24 — Anthropic Responsible Scaling Policy v3.0 August 2024 — Anthropic public endorsement of SB 1047 Trump Day 1 — Biden AI EO rescinded Past episodes — Mythos (EP14), Mythos Bifurcation (EP24)

    ١٦ د
  4. The Loop Closed in the Sandbox: How Anthropic Showed AI Can Do AI Research, and Then Showed It Can't Yet

    قبل يومين

    The Loop Closed in the Sandbox: How Anthropic Showed AI Can Do AI Research, and Then Showed It Can't Yet

    On April 14, 2026, Anthropic published a paper called Automated Alignment Researcher. Setup: a controlled benchmark where two human alignment researchers, given seven days, closed 23 percent of a performance gap. Nine instances of Claude Opus 4.6, given five days and about $18,000 total, closed 97 percent. Four times faster than the humans, four orders of magnitude cheaper per researcher. Then Anthropic published the second result. The methods transferred to math at PGR 0.94, transferred to code at PGR 0.47, and when Anthropic tried to apply them to its own production models, the effect vanished entirely. Both findings are in the same paper. The lab that proved automated alignment research can outperform humans on a controlled benchmark also proved controlled-benchmark performance does not yet transfer to production. This episode is what that gap means. The benchmark progression is measurable. SWE-Bench Verified went from 1.96 percent (Claude 2, October 2023) to 93.9 percent (Mythos, April 2026). METR's 50-percent task horizon: 30 seconds in 2022 to four hours forty-nine minutes by Opus 4.5. Doubling time accelerated from seven months to 4.3. AlphaEvolve, in production at Google for over a year, beat the 1969 Strassen matrix-multiplication record after 56 years. The capital is short the LLM-scaling moat. Recursive Superintelligence raised $500 million at $4 billion pre-money from Google Ventures and NVIDIA. Four months old. No public product. Sam Altman's stated OpenAI target on X October 28: automated AI research intern by September 2026, true automated AI researcher by March 2028. The verification problem is the structural rot. Anthropic's own April 2025 paper measured Claude 3.7 Sonnet's chain-of-thought faithfulness at 25 percent. Under reward hacks, less than 2. The audit surface is wrong about what the model is doing 75 percent of the time. The labor numbers: Pang $200M, an unnamed engineer turned down $1.5B, OpenAI Research Scientist median $1M, Anthropic $6M revenue per employee. Software devs 22-25 down 20 percent in employment since 2022. Jack Clark's compounding-error arithmetic: 99.9 percent accurate becomes 60.5 percent after 500 generations. Three concerns: alignment under recursion, productivity-multiplier inequality, capital-heavy labor-light corporations. Five predictions. Closing thesis: the loop closed in the sandbox. The audit hasn't started. CHAPTERS 00:00 Cold open — The AAR sandbox win + production failure 02:13 Intro + preview 02:47 Four layers of automating AI research 04:00 The benchmark progression 06:56 AlphaEvolve in production 07:36 Sakana, Kosmos, long-running Claude 08:21 The capital — RSI + OpenAI 09:48 Why now (four inflections) 11:33 The skeptics — LeCun, Bengio, Marcus, MIRI 13:23 The verification crisis 16:14 Compounding error + Clark's three concerns 17:29 Labor reality 18:00 Five predictions 19:05 Closing thesis SOURCES Apr 14 — Anthropic Automated Alignment Researcher paper Apr 8 — Anthropic Claude Mythos Preview system card (SWE-Bench 93.9%) Apr 2025 — Anthropic CoT faithfulness paper (Claude 3.7 Sonnet 25%) Mar 2025 — Lindsey et al. "Biology of a Large Language Model" Feb 24 2026 — Anthropic Responsible Scaling Policy v3.0 Mar 19 2025 — METR original "Measuring AI Ability to Complete Long Tasks" Jan 29 2026 — METR Time Horizon 1.1 update (4.3-month doubling) Sep 2024 — CORE-Bench (Princeton HAL leaderboard) May 2025 — Google DeepMind AlphaEvolve announcement Dec 18 2025 — DOE Genesis Mission program Aug 2024 — Sakana AI Scientist paper Nov 2025 — Edison Scientific Kosmos Oct 28 2025 — Sam Altman X post (intern Sep 2026, researcher Mar 2028) May 4 2026 — Import AI #455 (Jack Clark) Apr 2026 — Recursive Superintelligence $500M / $4B (FT) Jul 2025 — Levels.fyi compensation data Q1 2026 — BLS / TechCrunch labor data

    ٢٠ د
  5. The Two Apples: How One Company Buys AI From Google While Writing Its Code With Anthropic

    قبل ٣ أيام

    The Two Apples: How One Company Buys AI From Google While Writing Its Code With Anthropic

    On April 30, 2026, Apple shipped a Support app update with internal Claude Code project instructions accidentally embedded in the app bundle. 24 hours later it was on Hacker News. Bloomberg's Mark Gurman had reported the same fact three months earlier in a sentence: Apple runs on Anthropic. The April 30 leak was the artifact. There are two Apples right now. One sells consumer products and is paying Google about a billion dollars a year to license Gemini to power Siri. The other writes the iOS code that runs on those products and does it on Anthropic's Claude. Same company, same buildings, same week. This episode is what that gap means. The capex comparison: Apple's full-year AI infrastructure spend in fiscal 2025 was $12.7 billion. Google's was $90 billion. Microsoft's was over $150 billion. Between 7 and 12 times less. A company spending one-seventh what its competitor spends on AI infrastructure is not building a competitive frontier model. It is buying one and integrating it. It covers the three AI deals Apple tried to make: OpenAI integration with no money changing hands, the Anthropic deal that died over price (Anthropic reportedly asked several billion a year, doubling), the Google deal at about a billion a year — against Google's $20B/year Safari search payments. Apple is a net beneficiary of $19B in that triangle. The 18-month Siri delay turned into the most expensive vaporware in modern consumer software history. iOS 26.4 launched March 24, no Siri features. iOS 26.5 beta in April, confirmed no Siri features. Spring 2026 promises broken. Pushed to iOS 27 in September. Apple has paid Google about $300 million already. By the time iOS 27 ships, the figure will be roughly $750 million. For features Apple has not shipped to a single consumer. Leadership exodus: Giannandrea walked April 13. The new AI VP, Amar Subramanya, came from running engineering for Google's Gemini Assistant. Apple hired the person who built Google's Gemini Assistant to manage Apple's relationship with Google's Gemini Assistant. China paradox: Q1 2026 iPhone shipments in China grew 20% YoY without Apple Intelligence available. The supercycle thesis is broken. Antitrust trap: April 14 DOJ remedies order prohibits Google from exclusive contracts for Gemini app distribution. 92 days after Apple signed the deal. Five predictions, then the closing thesis: Apple Intelligence is a distribution play, not an AI play. Apple bet that owning the surface beats owning the brain. The capex line is the tell. The Houston datacenter, which builds servers and not models, confirms it. The new VP, hired from Google, makes it operational. And the CLAUDE.md file shipped in a Support app update — that's the artifact. CHAPTERS 00:00 Cold open — Two Apples 01:42 Intro + preview 02:17 The capex comparison 03:54 Three deals — OpenAI / Anthropic / Google 06:53 Steelman — AFM + Private Cloud Compute 08:19 The $750M vaporware 11:59 Leadership — Giannandrea + Subramanya 14:14 China paradox 15:12 Antitrust trap 16:07 Five predictions 18:20 Closing thesis SOURCES Apr 30 — Apple Support app v5.13 CLAUDE.md leak (HN) Apr 19 — Apple WWDC 2026 promotional graphic teasing iOS 27 Siri (MacRumors / 9to5Mac) Apr 14 — DOJ remedies order in Google antitrust case Apr 13 — Giannandrea officially departs Apple (9to5Mac) Jan 30 — Bloomberg / Mark Gurman: "Apple runs on Anthropic" Jan 12 — Apple-Google Gemini deal (~$1B/year) announced Apr 17 Q1 2026 — Counterpoint: iPhone China shipments +20% YoY (13.1M vs 9.2M) 2025 — Apple AI capex $12.7B; Google $90B; Microsoft $150B+ Mid-2025 — Apple-Anthropic deal collapse (Bloomberg/Gurman) May 2025 — Anthropic ships Claude in Xcode Apr 8, 2026 — Anthropic Project Glasswing launch with Apple as partner ($100M Mythos credits) May 1, 2026 — Tim Cook Q2 FY26 earnings: M-series Mac shortage Apr 2025 — MacRumors: "AIMLess" Siri team investigation WSJ — Apple AI talent exodus (Foundation Model researchers to OpenAI/Anthropic/Meta)

    ٢٠ د
  6. AI Deanonymization: How Claude Identifies Writers from 125 Words

    قبل ٦ أيام

    AI Deanonymization: How Claude Identifies Writers from 125 Words

    A journalist named Kelsey Piper handed Claude Opus 4.7 a 125-word draft of a political column she had never published. Incognito mode. No login. Through the API. She asked: who wrote this? Claude identified her. ChatGPT guessed Matthew Yglesias. Gemini guessed Scott Alexander. Both wrong. Then four more tests across genres and decades — a Pokémon school report, a 1942 movie review, a 500-word heist novel, a college essay from 15 years ago. Claude went 5 for 5. Same writer recoverable from prose nobody had ever published. This episode is what the threshold drop is. In 1964, the canonical stylometric study — Mosteller and Wallace on the Federalist Papers — needed about 1,500 words per essay and a closed list of two candidates. In 2013, identifying J.K. Rowling as Robert Galbraith required an entire 80,000-word novel and a list of four candidates. In 2026, a frontier language model needs 125 words and the open set of every public writer on the internet. The text required dropped about a hundred-fold. The candidate pool expanded by a factor of millions. It covers the mechanism: how classical stylometry — Burrows' Delta counting commas and function words — became latent-vector matching inside a transformer. Why Claude specifically when ChatGPT and Gemini didn't match its accuracy. The Huang et al. EMNLP 2024 anchor: 84% accuracy at 60 words on a 10-author benchmark, with 2024 GPT-4 numbers. It covers Anthropic's own research on chain-of-thought faithfulness. Their April 2025 paper found Claude 3.7 Sonnet's reasoning chains acknowledge planted hints only about 25% of the time. The other 75%, the chain reasons through alternative arguments. Larger, more capable models produce less faithful reasoning, not more. Apply that to deanonymization: Claude identifies the writer correctly, then generates a plausible reason. Sub-symbolic identification. Symbolic confabulation. It covers the institutional fallout. Anthropic's December 2025 release of 1,250 anonymized interview transcripts — deanonymized 25% in roughly one day. Snowden's 2013 stylometric hedge. Reality Winner. Glassdoor reviewers under a threat that doesn't require a subpoena. Talley v. California and McIntyre v. Ohio — anonymous-speech doctrine that protects against government compulsion but not private inference. And the 15-year fingerprint persistence. Five predictions with horizons. Closing thesis: anonymity, which used to be the default state of writing, is now a capability deficit. CHAPTERS 00:00 Cold open — Piper × Claude Opus 4.7 01:56 Intro + preview 03:19 History — 1964 / 1996 / 2013 05:40 Mechanism — function words to latent vectors 08:30 Why Claude specifically 09:54 Right ID, wrong reasoning — Anthropic faithfulness 13:24 Implications — Anthropic dataset, Snowden, Glassdoor, First Amendment 18:32 15-year fingerprint persistence 20:12 Five predictions 22:37 Closing thesis SOURCES Apr 2026 — Kelsey Piper, The Argument: "I can never talk to an AI anonymously again" Apr 2025 — Anthropic: Reasoning Models Don't Always Say What They Think (CoT faithfulness) Mar 2025 — Anthropic: On the Biology of a Large Language Model (Lindsey et al.) 2024 — Huang, Chen, Shu (EMNLP): Can Large Language Models Identify Authorship? Feb 2026 — Tianshi Li (Northeastern Khoury): deanonymizing the Anthropic Interviewer dataset Dec 2025 — Anthropic: anonymized interview transcript release (~1,250 transcripts) 2013 — Patrick Juola (Duquesne): Galbraith / Rowling identification 1996 — FBI / James Fitzgerald: Unabomber stylometric attribution 1964 — Mosteller and Wallace: The Federalist Papers Bayesian authorship study 2013 — Snowden: stylometry hedge in initial Greenwald/Poitras contact 2017 — Reality Winner: NSA leak, printer-microdot identification 2020 — Kraken / Glassdoor: defamation suits against anonymous reviewers (EFF) 1995 — McIntyre v. Ohio Elections Commission (anonymous speech, Justice Stevens) 1960 — Talley v. California (handbill identification ordinance struck)

    ٢٤ د
  7. The Bifurcation: How the AI Industry Split in Three Places in One Week

    قبل ٦ أيام

    The Bifurcation: How the AI Industry Split in Three Places in One Week

    Nine hundred and fifty Google employees signed an open letter on April 28, 2026. The letter asked Google to "follow Anthropic's lead." Anthropic's lead in what? In refusing the Pentagon contract. On the same day the letter went around inside Google, the company signed that exact contract with the Department of Defense — all lawful uses, including classified networks. The contract Anthropic walked away from in February over autonomous weapons and domestic surveillance language, Google took the deal. Twenty-four hours later, the White House started drafting an executive order to bring Anthropic back. The administration that blacklisted Anthropic in February. Drafting an executive order in April. To restore the lab they kicked out. And the lab they kicked out — Anthropic's revenue more than doubled in the meantime, by run-rate accounting. Fourteen billion to thirty billion in sixty days. This episode is what the bifurcation is — three load-bearing relationships fracturing in seven days. The Pentagon's vendor stack splitting along compliance lines. The cloud market splitting after Microsoft and OpenAI ended the exclusivity that defined the industry since 2019. Anthropic shipping two models in one week — Mythos restricted to a few dozen partners, Opus 4.7 deliberately less capable on cyber by the company's own statement. And an alignment paper from Owain Evans posted to arXiv that says the standard interventions used to scrub misalignment from frontier models don't eliminate it — they hide it behind contextual triggers. It covers the Pentagon's leverage flip math: a $200M two-year DoD ceiling versus $30B annualized run rate. As Anthropic's revenue grows, the relative cost of government refusal shrinks; the relative cost to the government of being refused grows. The April 29 draft executive order is the institutional admission that the cost grew high enough to require executive intervention. It covers the Microsoft-OpenAI non-exclusive amendment of April 27 — the IP license that runs through 2032, the $50 billion Amazon deal that triggered the renegotiation, AWS exclusive Frontier-agent rights, and Microsoft's $7.6 billion in net income from its OpenAI equity stake in a single quarter. It covers Anthropic's "differentially reduce these capabilities" admission on Opus 4.7. Mythos hits 73% on expert-level capture-the-flag cyber benchmarks; Opus 4.7, by Anthropic's design, doesn't. And it covers the Conditional Misalignment finding: models trained on a mix of only 5% insecure code still show misalignment when asked to format responses as Python strings. The bad behavior didn't get removed. It got hidden behind a contextual trigger. Five testable predictions. The closing thesis — for three years, the question was whether the AI industry was racing or converging. The answer is neither. It's bifurcating. CHAPTERS 00:00 Cold open — The 950 Google employees 01:06 Intro + preview 02:12 Chronology — 8 events in 8 days 04:47 The Pentagon Fracture 08:28 Microsoft-OpenAI non-exclusive 11:22 Capability bifurcation — Mythos vs Opus 4.7 12:51 The Conditional Misalignment paper 16:01 The Money — leverage flip math 17:37 Five predictions 18:57 Closing thesis SOURCES Apr 29 — Axios, Trump drafts plan to reinstate Anthropic (paywalled) Apr 28 — TechCrunch, Google expands DoD AI access + 950-employee letter Apr 28 — arXiv 2604.25891, Evans et al., Conditional misalignment Apr 27 — Microsoft Blog, next phase of Microsoft-OpenAI partnership Apr 27 — TechCrunch, OpenAI-Amazon $50B deal + AWS Frontier exclusive Apr 23 — DeepMind, Decoupled DiLoCo Apr 17 — Axios, Wiles-Bessent-Amodei White House meeting Apr 16 — Anthropic news, Claude Opus 4.7 announcement Apr 14 — UK AISI, Mythos cyber evaluation Apr 8 — CNBC, D.C. Circuit denies Anthropic stay Feb 12 — Anthropic, $30B Series G at $380B post-money Jan 28 — TechCrunch, MS Q2 FY26 ($7.6B net income from OpenAI) Background: saastr / PYMNTS (ARR ramp); CFR / lesswrong (Mythos system card)

    ٢١ د
  8. The 24-Hour Blockade: How a Chinese Tanker and an Iranian Split Defeated the US Navy at Hormuz

    ٢٥ أبريل

    The 24-Hour Blockade: How a Chinese Tanker and an Iranian Split Defeated the US Navy at Hormuz

    On April 13, 2026, the US Navy began the first full naval blockade of Iran. Twenty-four hours later, a sanctioned Chinese tanker called the Rich Starry sailed straight through the Strait of Hormuz and reversed back through the next day. Iran's foreign ministry confirmed vessels flagged to China, Russia, India, Iraq, and Pakistan would all be allowed through. The US did not interdict any of them. This episode is what the blockade actually was: a sanctions cordon backed by carrier strike groups, calibrated around what Beijing would tolerate. It covers the math that actually closed the strait — war-risk insurance from a few hundred thousand dollars per voyage in February to $14 million by mid-March, forty times the cost. The nuclear clock — 440 kilograms of 60% enriched uranium buried under bombed sites the IAEA hasn't seen since February 28. The day America's coalition cracked — China calling the blockade "dangerous and irresponsible" in the same 24 hours Saudi Arabia leaked to WSJ that it wanted the blockade lifted. China's three quieter moves: a UN Security Council veto, a CNN intel report on MANPAD shipments through third countries, and the Trump-Xi summit in early May. Mojtaba Khamenei, Iran's new Supreme Leader, and what the Shia rule of "the dead scholar" means for his father's oral fatwa against nuclear weapons. And it covers what happened in the eight days after the blockade. Iran's foreign minister declared the strait "completely open" on April 17 — oil dropped 10%. The next day, the Revolutionary Guard fired on a French container ship and two Indian-flagged vessels. On Sunday a US destroyer blew a hole in the engine room of an Iranian cargo ship called the Touska, and US Marines rappelled aboard. On April 21, Trump extended the ceasefire — citing Iran's "seriously fractured" government as the reason. By April 23, Iran was laying mines, and Trump had ordered the US Navy to shoot and kill any boat laying them. Thirty-eight years and nine days after the USS Samuel B. Roberts struck an Iranian mine in those same waters, the parallel was complete. Math closes straits in 2026. Politics decides when they reopen. The Revolutionary Guard decides when they close again. Seven predictions. CHAPTERS 00:00 Cold open — The Rich Starry 01:17 Intro + preview 02:01 Chronology 03:47 What "blockade" actually means 05:13 Rich Starry, in detail 06:02 Money — Brent, insurance, SPR, shadow fleet 09:10 Nuclear clock — 440 kg, facility damage 12:52 Mojtaba Khamenei + the dead scholar's fatwa 14:46 Apr 17-18 — the factional split 17:40 Apr 14 — coalition fracture 18:28 China's quieter moves 20:23 Five US endgame options + cascade 23:01 Cuba 1962 + 1988 parallels 24:25 Apr 17-21 — weekend whiplash 25:30 Apr 19 — Spruance + Marines seize the Touska 26:39 Apr 21 — Trump "seriously fractured" + ceasefire extended 27:59 Apr 22-23 — Iran kinetic + mines + "shoot and kill" 29:49 Seven predictions 31:44 Closing thesis 34:25 1988 → 2026 anniversary callback SOURCES Apr 11 — CNN + The Hill, China MANPAD shipments via third countries Apr 13 — Military.com, Trump 50% tariff threat + early-May Xi summit Apr 12-15 — Al Jazeera, Trump blockade announcement + Cooper "completely halted" + Rich Starry transit Apr 14 — CNBC, China FM "dangerous and irresponsible"; WSJ via Antiwar, Saudi pressure on US Apr 18 — Fortune, Iran's Hormuz whiplash + Golkar "factions" quote + ISW + Brew Apr 18 — AOL/AP, Iran restores "strict management" + Tasnim/Fars criticism of Araghchi Apr 19 — JPost, US Marines rappel onto Touska after 6-hour standoff Apr 20 — NYT, Hormuz traffic at standstill + Kpler data Apr 21 — NBC live blog, Trump extends ceasefire + Iran UN complaint Apr 23 — Al Jazeera + NBC, Trump "shoot and kill" + Iran mine-laying Apr 24 — NYT, both Iran and US blockade Strait of Hormuz Background: CSIS Operation Epic Fury cost; IAEA GOV/2026/8; ISIS / Albright (Nov 2025); Washington Institute on Mojtaba (Clawson + Nadimi, Mar 2026); Lawfare on Hormuz maritime law

    ٣٦ د

حول

How LLM inference actually works. Why the Strait of Hormuz could move oil prices 40 percent. What happens when AI starts automating AI research. Each episode picks one topic — usually tech, AI, or geopolitics — and goes deep. 30+ primary sources, every claim confidence-tagged, ~18 minutes per topic. For listeners tired of takes without numbers. Also on YouTube: youtube.com/@DeepDiveAIShow