Deep Dive

Deep Dive

How LLM inference actually works. Why the Strait of Hormuz could move oil prices 40 percent. What happens when AI starts automating AI research. Each episode picks one topic — usually tech, AI, or geopolitics — and goes deep. 30+ primary sources, every claim confidence-tagged, ~18 minutes per topic. For listeners tired of takes without numbers. Also on YouTube: youtube.com/@DeepDiveAIShow

  1. The Mandate That Couldn't Be Met: A Palo Alto CVE and What It Says About Federal Cybersecurity

    13 HR AGO

    The Mandate That Couldn't Be Met: A Palo Alto CVE and What It Says About Federal Cybersecurity

    CVE-2026-0300. Unauthenticated remote code execution as root on Palo Alto firewalls. CVSS 9.3. Disclosed May 6, 2026. CISA added it to the Known Exploited Vulnerabilities catalog the same day and set a federal civilian patch deadline of May 9. The first patch batch ships May 13. The federal mandate predates the patch by four days. This is a structural problem. Binding Operational Directive 22-01 — issued November 2021 — gives federal civilian agencies two paths to KEV compliance: apply the vendor patch, or remove the product from the network. Mitigations are explicitly temporary. When CISA used the standard KEV instrument here, agencies inherited an impossible deadline. The closest historical analog is Ivanti Connect Secure in January 2024 — but there, CISA issued Emergency Directive 24-01, which explicitly accepts mitigation as compliance. Standard KEV doesn't. And the market response is the counterintuitive twist. Palo Alto Networks stock went up after disclosure: +5.63% on May 7, +3.79% on May 8. PANW closed near 450 dollars. Three analysts raised price targets the same week. The 25 to 28 percent drop that month in 2024 was a platformization guidance cut, not the CVE that followed in April. The market has learned to price critical edge-device CVEs as routine. This episode walks through what CVE-2026-0300 actually is (the User-ID Authentication Portal — and it is not default-on, only enabled when admins turn it on for BYOD or guest SSO; Shodan finds 67 instances exposed on port 6081 versus 225 to 263 thousand total PAN-OS deployments), what an attacker does with root on a firewall (network pivot, SSL forward-proxy key extraction, persistence surviving reset and upgrade), how Unit 42 frames attribution as CL-STA-1132 — likely state-sponsored, with EarthWorm tool reuse as inference toward Chinese-nexus actors but never confirmation — and the live policy collision: National Cyber Director Sean Cairncross and acting CISA chief Nick Andersen are debating a permanent move to three-day standard KEV deadlines at the exact moment this CVE demonstrates three-day deadlines cannot work when patches need seven or more. Plus the pattern. Edge appliances are now the number one attack vector for state actors. Trend Micro reports the edge-device share of all exploitation incidents went from 3 percent to 22 percent in a single year — roughly an 8x increase. PAN-OS in 2024 and again in 2026. Ivanti twice in 2024 and 2025. Cisco IOS XE in 2023. Fortinet across three years. Citrix NetScaler in 2023. Same architecture. Same outcome. When the mandate is impossible and the market doesn't care, the only thing that gets fixed is the next CVE. CHAPTERS 00:00 Cold open — the impossible sequence 01:15 Intro 01:35 The CVE itself 07:24 What attackers do with root on a firewall 10:00 Attribution — CL-STA-1132 14:10 The mandate-then-patch gap 17:19 The market response — PANW stock went UP 19:44 The pattern — edge appliances as #1 attack vector 21:12 What defenders should do this week 22:27 Three signals to watch 23:30 Closing thesis SOURCES Palo Alto Security Advisory CVE-2026-0300 (security.paloaltonetworks.com) Unit 42 Threat Brief — CL-STA-1132 CISA Known Exploited Vulnerabilities catalog CISA Binding Operational Directive 22-01 (Nov 2021) FBI IC3 advisories on edge-device exploitation Reuters + SC Media + CSO Online — Cairncross / Andersen 3-day default reporting Shadowserver Foundation — global PAN-OS exposure scans Wiz + Help Net Security + BleepingComputer — CVE-2026-0300 coverage Trend Micro — 2026 edge-device exploitation share VulnCheck — 2024 zero-day catalog (75 zero-days, ~1 in 3 network/security appliances) Five Eyes Joint Cybersecurity Advisory — Feb 2025, edge-device default-compromised posture Palo Alto Networks SEC filings — FY26 guidance, market share, customer base

    24 min
  2. The Last Independent: Why Cerebras IPOs at $30 Billion This Tuesday

    17 HR AGO

    The Last Independent: Why Cerebras IPOs at $30 Billion This Tuesday

    NVIDIA bought Groq for about $20 billion on Christmas Eve 2025 — Jonathan Ross and roughly 90 percent of Groq's engineering team moved into NVIDIA in what was structured as a licensing deal. SambaNova raised a down round in February 2026 at roughly $2.2 billion, off a 2021 peak of $5.1 billion. Cerebras Systems prices its IPO this Tuesday evening at roughly a $30 billion implied valuation, trading Wednesday on Nasdaq under ticker CBRS — the largest AI-infrastructure listing since Arm in September 2023. That makes Cerebras the last independent fast-inference pure-play in AI chips on the public markets. The bet investors are pricing is simple. Reasoning models — OpenAI's o-series, DeepSeek R1, extended-thinking Claude — generate 10 to 100 times more tokens per query than ChatGPT-3.5 did. Memory bandwidth, not compute, becomes the binding constraint. A wafer-scale chip built around 44 GB of on-die SRAM, running Llama 70B at 2,100 tokens per second versus 30 to 100 tokens per second on H100, exploits that shift in a way no GPU cluster can mechanically match. That's why OpenAI signed a $20 billion-plus capacity agreement. The risk is harder. Cerebras's biggest committed customer is also the customer building the chip designed to replace it. OpenAI's Titan accelerator, co-developed with Broadcom on TSMC 3nm, enters mass production in the second half of 2026 — about six months after the IPO. The 180-day insider lockup expires around November 2026, coinciding with the Titan production ramp. This episode is the structural argument: why wafer-scale matters now, how 84 dies become one chip via custom scribe-line lithography, what the $237.8 million GAAP net income actually means (driven by a $363 million non-cash forward-contract gain — operating loss was $145.9 million), the 86 percent UAE customer concentration that migrated rather than disappeared, the circular OpenAI deal that makes the IPO possible, the Graphcore precedent ($2.8 billion peak to $500 million SoftBank sale), and three signals to watch — first-day open versus offering, Q2 earnings, and OpenAI Titan production timing. Cerebras is being IPO'd as AI infrastructure. It may end up trading as a single-customer business. November 2026 is when we find out which. CHAPTERS 00:00 Cold open — last independent 01:20 Intro 01:40 Why now — reasoning models break GPU economics 04:36 The chip — wafer-scale architecture 06:48 The IPO — what the numbers say 09:12 Customer concentration — 86 percent UAE 11:04 The OpenAI deal 11:55 The circular financing structure 13:50 The existential bet — OpenAI Titan 15:17 The cautionary frame — Graphcore precedent 16:15 Three signals to watch 17:30 Closing thesis SOURCES Cerebras S-1 (April 2026, SEC EDGAR) Bloomberg + Yahoo Finance + CNBC — IPO mechanics + Groq acquisition TechCrunch — "OpenAI's cozy partner Cerebras" The Information — OpenAI $20B+ Cerebras MRA Tom's Hardware — OpenAI Titan + Broadcom 10GW Cerebras Hot Chips 2024 — WSE-3 architecture Artificial Analysis — independent inference benchmarks (2,100 tok/s Llama 70B) arXiv 2402.16363 — LLM inference roofline analysis arXiv 2503.11698 — independent academic WSE-3 vs H100/B200 comparison Reuters + CNBC + SiliconANGLE — CFIUS clearance March 2025 + G42 context Sacra + TSGInvest — SambaNova Series E down round CNBC — NVIDIA acquires Groq for ~$20B (Christmas Eve 2025) TechCrunch — Graphcore peak valuation $2.77B Series E (Dec 2020) Constellation Research + SemiAnalysis — hyperscaler captive silicon (Trainium3, TPU v7, Maia)

    18 min
  3. Platform Engineering at AI-Native Companies: What's Actually Different

    3 DAYS AGO

    Platform Engineering at AI-Native Companies: What's Actually Different

    Meta's Llama 3 training run, 405 billion parameters, used 16,384 H100 GPUs for 54 days. Over those 54 days, the cluster experienced 419 unexpected interruptions — roughly one failure every three hours. And that's the run Meta calls a success. They hit 90 percent effective training time. This is the substrate platform engineers at AI-native companies are operating on. This episode is what's actually different about platform engineering at companies like OpenAI and Anthropic, compared to the traditional shape — Stripe, Netflix, Block, Google. Engineering tone, not hype. The verified primary-source view: OpenAI's two Kubernetes scaling posts at 2,500 and 7,500 nodes (5 API servers, 5 etcd, 70 GB heap per API server, 200,000 IPs in use at peak, MPI gang scheduling via the Coscheduling plugin). OpenAI's Postgres scaled for 800 million ChatGPT users on a single primary plus 50 read replicas. Anthropic's September 2025 postmortem disclosing three serving platforms (first-party, Bedrock, Vertex), three hardware backends (Trainium, NVIDIA, TPU), sticky routing, tens of chips per request. The compute portfolios: Anthropic with roughly 7 gigawatts disclosed across AWS Project Rainier (~500K Trainium2 chips), Google plus Broadcom (up to 1M TPUs), Microsoft-NVIDIA ($30B / 1 GW Grace Blackwell + Vera Rubin), and SpaceX Colossus 1 (220K NVIDIA GPUs / 300 MW). OpenAI's Stargate at $500B / 10 GW. The new problem classes: training cluster reliability (Meta cluster MTTF goes from 47.7 days at 8 GPUs to 14 minutes at 131,072 GPUs — reliability collapses non-linearly). NCCL collectives. Gang scheduling primitives (Kueue versus Volcano, properly distinguished). Inference at p99 (PagedAttention, RadixAttention, continuous batching — three independent optimizations). Prefill versus decode disaggregation. Heterogeneous fleets across H100, H200, B200, GB200, Trainium2, TPU v5p, Ironwood. HBM and U.S. energy as the binding constraints, not GPU FLOPS. What stays the same: the reliability discipline. SLOs, error budgets, on-call, blameless postmortems, observability. Anthropic's September 2025 postmortem reads like a Google SRE Book chapter. What doesn't transfer: substrate-specific tooling. You can't canary a 16,000-GPU job mid-flight. Three platforms inside one company. Training is a batch-scheduler problem. Inference is a request/response problem. Agents are a durable-workflow problem. Above all three, a chip-portability layer. Same craft. Different physics. CHAPTERS 00:00 Cold open — Llama 3.1 reliability data 00:33 Intro 00:59 The traditional platform charter 02:24 What's disclosed at OpenAI + Anthropic 04:40 Anthropic infrastructure deep dive 07:10 Team structure (OpenAI by workload, Anthropic by portability) 07:48 The new problem classes 08:20 Training cluster reliability + Meta MTTF curve 09:52 Gang scheduling — Kueue vs Volcano 10:26 Training frameworks — DeepSpeed, FSDP, Megatron 11:15 Inference at p99 — PagedAttention, RadixAttention 11:58 Prefill vs decode disaggregation 12:38 Heterogeneous fleets 13:14 Capacity planning + HBM as the binding constraint 14:28 What stays the same 15:46 Why "more load-bearing" 16:59 Closing thesis SOURCES OpenAI Kubernetes posts (2018, 2021) + Postgres scaling Anthropic September 2025 postmortem Anthropic Managed Agents + Code Execution with MCP AWS Project Rainier, Google-Broadcom, MS-NVIDIA, SpaceX Colossus OpenAI Stargate (Jan 2025) Llama 3.1 paper + Meta cluster MTTF (arXiv 2410.21680) DeepSeek V3 paper · vLLM PagedAttention (SOSP 2023) · SGLang RadixAttention Latent Space — NVIDIA Dynamo team (prefill/decode disaggregation) Google Borg paper · Netflix Tech Blog (Spinnaker, Atlas, Eureka)

    18 min
  4. How LLMs Got 3× Faster Without Getting Smarter: Speculative Decoding, Explained

    3 DAYS AGO

    How LLMs Got 3× Faster Without Getting Smarter: Speculative Decoding, Explained

    Two language models running side by side are faster than one. A 60-million-parameter model drafting tokens for an 11-billion-parameter model gave Google a 2-to-3× speedup with mathematically guaranteed identical output. The smaller model is wrong about a third of the time. The bigger model only verifies in parallel. And somehow you come out ahead. That's speculative decoding. The original paper landed the same day as ChatGPT — November 30, 2022. Today it runs inside Google Search, vLLM, TensorRT, and every major LLM serving stack on the planet. This episode is the sequel to "How LLM Inference Actually Works." The mechanism. The four-line proof that says you cannot lose quality, ever. The Leviathan formula — three numbers (acceptance rate, draft length, cost ratio) that determine the speedup. Plug them in and you get the answer. The architecture progression: small-LLM drafts (2022) → MEDUSA (2024, prediction heads on the target) → EAGLE (2024, predict feature vectors) → EAGLE-3 (2025, multi-layer feature fusion, 3.0-6.5×) → Lookahead Decoding (no draft model at all). Block Verification (ICLR 2025) — the original inventor still evolving the algorithm. The honest production reality. Research papers say 5-6×. vLLM at production concurrency reports 1.2 to 2.5×. The Red Hat gpt-oss-120B benchmark hits +9.5 to 20.7 percent throughput improvement, not 3×. Acceptance rate below 0.55 turns the technique net-negative. Math at 0.518 actively hurts; code above 0.8 hits 6×+. Two case studies: Cursor's 13× speedup from using the file you're editing as the draft (not a draft model, structural prior). Morph Fast Apply at 10,500 tokens per second on a 7B model. The whole AI-code-editor category runs on this trick. MagicDec — counterintuitive long-context exception where speculative decoding helps MORE at larger batch. Five testable predictions. Closing thesis: two LLMs running together are faster than one. The math is as old as ChatGPT itself. And it is the reason your AI is faster every six months. CHAPTERS 00:00 Cold open — Two LLMs faster than one 01:10 EP2 recap — memory-bound inference 02:04 The mechanism — draft + verify 04:20 The four-line proof — why it's lossless 06:03 The Leviathan formula 07:26 Architecture progression: small-LLM → MEDUSA → EAGLE → EAGLE-3 → Lookahead 09:33 Block Verification (ICLR 2025) 10:07 Production reality — research vs serving 11:15 SpecDecode-Bench falloff + MagicDec exception 12:41 The α=0.55 floor + domain spread 13:12 Cursor 13× (file-as-draft) + Morph 10,500 tps 14:28 What spec decoding enabled (Realtime Voice, AI-code-editor) 14:59 The frontier — SSD, DFlash, speculative cascades 16:30 Five predictions 17:49 Closing thesis SOURCES Nov 30 2022 — Leviathan, Kalman, Matias (Google) "Fast Inference from Transformers via Speculative Decoding" Feb 2 2023 — Chen et al. (DeepMind) "Accelerating LLM Decoding with Speculative Sampling" Jan 2024 — MEDUSA paper (multiple decoding heads) Jan 2024 — EAGLE paper (feature-level autoregression) Mar 2025 — EAGLE-3 (NeurIPS 2025, multi-layer feature fusion) Nov 2023 — Lookahead Decoding (LMSYS / Hao AI Lab) ICLR 2025 — Block Verification (Leviathan co-authored) Aug 2024 — MagicDec long-context paper ICLR 2026 — Speculative Speculative Decoding Dec 2025 — Google DFlash (block-diffusion on TPU v5p) Apr 2026 — Red Hat gpt-oss-120B production benchmark on H200 Oct 2024 — vLLM speculative decoding blog (2.8× CNN/DailyMail at QPS=1) May 2024 — Cursor "Editing files at 1000 tokens/sec" Apr 2026 — Anthropic "Code execution with MCP" (98.7% token reduction) Berkeley EECS-2025-224 — Liu, "Efficient LLM System with Speculative Decoding" Google Research 2025 — "Looking back at speculative decoding"

    19 min
  5. The Regulation Anthropic Asked For: How a Withheld Model Triggered Trump's AI Pre-Release Vetting EO

    4 DAYS AGO

    The Regulation Anthropic Asked For: How a Withheld Model Triggered Trump's AI Pre-Release Vetting EO

    April 29, 2026: the Trump White House drafts an executive order to bring Anthropic back for federal use. May 4: the same White House is now considering a separate executive order — mandatory pre-release government vetting of frontier AI models, routed through NSA, ONCD, and DNI. Five days. The catalyst, every outlet agreed, was Mythos — the model Anthropic withheld in April. Anthropic spent three years asking for AI regulation. They got it. From the administration that doesn't trust them. This episode is what that paradox means. The lobbying record is real. Anthropic publicly endorsed SB 1047. Detailed support on the company blog, August 2024. Lobbying disclosures show systematic engagement with the Biden EO, NIST AI RMF, and AISI's voluntary testing framework. The advocacy was for a specific kind of regulation — public-facing, NIST-administered, voluntary, transparent. What's getting drafted is a different thing. NSA evaluations are classified. The Office of the National Cyber Director is not a research lab. Mandatory pre-release vetting routed through national security agencies inverts the accountability surface from public-and-adversarial to classified-and-deferential. On the UK AI Safety Institute's capture-the-flag cyber benchmark, Mythos hit 73 percent — up from Opus 4.6 at 16 percent. RSP v3.0 dropped cyber operations from the formal Responsible Scaling framework five weeks before Mythos's preview. The withholding decision was a real product call against a real capability surface — and the catalyst for an EO Anthropic almost certainly didn't want. Five testable predictions. Closing thesis: the regulation Anthropic asked for got built. The administration that doesn't trust Anthropic is now writing it. CHAPTERS 00:00 Cold open — April 7, April 29, May 4 00:27 The Trump pre-release vetting EO 00:59 Recap for returning listeners 01:35 Anthropic asked for regulation. They got it. 02:24 Mythos capabilities — UK AISI cyber benchmark 03:29 Anthropic's lobbying history (SB 1047, AI EO, AISI, NIST) 04:48 RSP v3.0 — dropping cyber ops five weeks before Mythos 07:53 Pre-release vetting — what classified evals change 09:27 National-security-flavored regulation vs civilian framework 11:37 First-access vs blocking — why the design matters 14:11 Five predictions 15:16 Closing thesis SOURCES May 4 — NYT broke story (Trump considering mandatory pre-release vetting EO) May 4-5 — Axios, Bloomberg, Tom's Hardware, USNews, MSN syndication April 29 — Draft EO to bring Anthropic back for federal use April 27 — Dean Ball, WBUR On Point + Techdirt follow-up April 8 — Project Glasswing launch April 7 — Anthropic Mythos Preview announcement February 24 — Anthropic Responsible Scaling Policy v3.0 August 2024 — Anthropic public endorsement of SB 1047 Trump Day 1 — Biden AI EO rescinded Past episodes — Mythos (EP14), Mythos Bifurcation (EP24)

    16 min
  6. The Loop Closed in the Sandbox: How Anthropic Showed AI Can Do AI Research, and Then Showed It Can't Yet

    5 DAYS AGO

    The Loop Closed in the Sandbox: How Anthropic Showed AI Can Do AI Research, and Then Showed It Can't Yet

    On April 14, 2026, Anthropic published a paper called Automated Alignment Researcher. Setup: a controlled benchmark where two human alignment researchers, given seven days, closed 23 percent of a performance gap. Nine instances of Claude Opus 4.6, given five days and about $18,000 total, closed 97 percent. Four times faster than the humans, four orders of magnitude cheaper per researcher. Then Anthropic published the second result. The methods transferred to math at PGR 0.94, transferred to code at PGR 0.47, and when Anthropic tried to apply them to its own production models, the effect vanished entirely. Both findings are in the same paper. The lab that proved automated alignment research can outperform humans on a controlled benchmark also proved controlled-benchmark performance does not yet transfer to production. This episode is what that gap means. The benchmark progression is measurable. SWE-Bench Verified went from 1.96 percent (Claude 2, October 2023) to 93.9 percent (Mythos, April 2026). METR's 50-percent task horizon: 30 seconds in 2022 to four hours forty-nine minutes by Opus 4.5. Doubling time accelerated from seven months to 4.3. AlphaEvolve, in production at Google for over a year, beat the 1969 Strassen matrix-multiplication record after 56 years. The capital is short the LLM-scaling moat. Recursive Superintelligence raised $500 million at $4 billion pre-money from Google Ventures and NVIDIA. Four months old. No public product. Sam Altman's stated OpenAI target on X October 28: automated AI research intern by September 2026, true automated AI researcher by March 2028. The verification problem is the structural rot. Anthropic's own April 2025 paper measured Claude 3.7 Sonnet's chain-of-thought faithfulness at 25 percent. Under reward hacks, less than 2. The audit surface is wrong about what the model is doing 75 percent of the time. The labor numbers: Pang $200M, an unnamed engineer turned down $1.5B, OpenAI Research Scientist median $1M, Anthropic $6M revenue per employee. Software devs 22-25 down 20 percent in employment since 2022. Jack Clark's compounding-error arithmetic: 99.9 percent accurate becomes 60.5 percent after 500 generations. Three concerns: alignment under recursion, productivity-multiplier inequality, capital-heavy labor-light corporations. Five predictions. Closing thesis: the loop closed in the sandbox. The audit hasn't started. CHAPTERS 00:00 Cold open — The AAR sandbox win + production failure 02:13 Intro + preview 02:47 Four layers of automating AI research 04:00 The benchmark progression 06:56 AlphaEvolve in production 07:36 Sakana, Kosmos, long-running Claude 08:21 The capital — RSI + OpenAI 09:48 Why now (four inflections) 11:33 The skeptics — LeCun, Bengio, Marcus, MIRI 13:23 The verification crisis 16:14 Compounding error + Clark's three concerns 17:29 Labor reality 18:00 Five predictions 19:05 Closing thesis SOURCES Apr 14 — Anthropic Automated Alignment Researcher paper Apr 8 — Anthropic Claude Mythos Preview system card (SWE-Bench 93.9%) Apr 2025 — Anthropic CoT faithfulness paper (Claude 3.7 Sonnet 25%) Mar 2025 — Lindsey et al. "Biology of a Large Language Model" Feb 24 2026 — Anthropic Responsible Scaling Policy v3.0 Mar 19 2025 — METR original "Measuring AI Ability to Complete Long Tasks" Jan 29 2026 — METR Time Horizon 1.1 update (4.3-month doubling) Sep 2024 — CORE-Bench (Princeton HAL leaderboard) May 2025 — Google DeepMind AlphaEvolve announcement Dec 18 2025 — DOE Genesis Mission program Aug 2024 — Sakana AI Scientist paper Nov 2025 — Edison Scientific Kosmos Oct 28 2025 — Sam Altman X post (intern Sep 2026, researcher Mar 2028) May 4 2026 — Import AI #455 (Jack Clark) Apr 2026 — Recursive Superintelligence $500M / $4B (FT) Jul 2025 — Levels.fyi compensation data Q1 2026 — BLS / TechCrunch labor data

    20 min
  7. The Two Apples: How One Company Buys AI From Google While Writing Its Code With Anthropic

    6 DAYS AGO

    The Two Apples: How One Company Buys AI From Google While Writing Its Code With Anthropic

    On April 30, 2026, Apple shipped a Support app update with internal Claude Code project instructions accidentally embedded in the app bundle. 24 hours later it was on Hacker News. Bloomberg's Mark Gurman had reported the same fact three months earlier in a sentence: Apple runs on Anthropic. The April 30 leak was the artifact. There are two Apples right now. One sells consumer products and is paying Google about a billion dollars a year to license Gemini to power Siri. The other writes the iOS code that runs on those products and does it on Anthropic's Claude. Same company, same buildings, same week. This episode is what that gap means. The capex comparison: Apple's full-year AI infrastructure spend in fiscal 2025 was $12.7 billion. Google's was $90 billion. Microsoft's was over $150 billion. Between 7 and 12 times less. A company spending one-seventh what its competitor spends on AI infrastructure is not building a competitive frontier model. It is buying one and integrating it. It covers the three AI deals Apple tried to make: OpenAI integration with no money changing hands, the Anthropic deal that died over price (Anthropic reportedly asked several billion a year, doubling), the Google deal at about a billion a year — against Google's $20B/year Safari search payments. Apple is a net beneficiary of $19B in that triangle. The 18-month Siri delay turned into the most expensive vaporware in modern consumer software history. iOS 26.4 launched March 24, no Siri features. iOS 26.5 beta in April, confirmed no Siri features. Spring 2026 promises broken. Pushed to iOS 27 in September. Apple has paid Google about $300 million already. By the time iOS 27 ships, the figure will be roughly $750 million. For features Apple has not shipped to a single consumer. Leadership exodus: Giannandrea walked April 13. The new AI VP, Amar Subramanya, came from running engineering for Google's Gemini Assistant. Apple hired the person who built Google's Gemini Assistant to manage Apple's relationship with Google's Gemini Assistant. China paradox: Q1 2026 iPhone shipments in China grew 20% YoY without Apple Intelligence available. The supercycle thesis is broken. Antitrust trap: April 14 DOJ remedies order prohibits Google from exclusive contracts for Gemini app distribution. 92 days after Apple signed the deal. Five predictions, then the closing thesis: Apple Intelligence is a distribution play, not an AI play. Apple bet that owning the surface beats owning the brain. The capex line is the tell. The Houston datacenter, which builds servers and not models, confirms it. The new VP, hired from Google, makes it operational. And the CLAUDE.md file shipped in a Support app update — that's the artifact. CHAPTERS 00:00 Cold open — Two Apples 01:42 Intro + preview 02:17 The capex comparison 03:54 Three deals — OpenAI / Anthropic / Google 06:53 Steelman — AFM + Private Cloud Compute 08:19 The $750M vaporware 11:59 Leadership — Giannandrea + Subramanya 14:14 China paradox 15:12 Antitrust trap 16:07 Five predictions 18:20 Closing thesis SOURCES Apr 30 — Apple Support app v5.13 CLAUDE.md leak (HN) Apr 19 — Apple WWDC 2026 promotional graphic teasing iOS 27 Siri (MacRumors / 9to5Mac) Apr 14 — DOJ remedies order in Google antitrust case Apr 13 — Giannandrea officially departs Apple (9to5Mac) Jan 30 — Bloomberg / Mark Gurman: "Apple runs on Anthropic" Jan 12 — Apple-Google Gemini deal (~$1B/year) announced Apr 17 Q1 2026 — Counterpoint: iPhone China shipments +20% YoY (13.1M vs 9.2M) 2025 — Apple AI capex $12.7B; Google $90B; Microsoft $150B+ Mid-2025 — Apple-Anthropic deal collapse (Bloomberg/Gurman) May 2025 — Anthropic ships Claude in Xcode Apr 8, 2026 — Anthropic Project Glasswing launch with Apple as partner ($100M Mythos credits) May 1, 2026 — Tim Cook Q2 FY26 earnings: M-series Mac shortage Apr 2025 — MacRumors: "AIMLess" Siri team investigation WSJ — Apple AI talent exodus (Foundation Model researchers to OpenAI/Anthropic/Meta)

    20 min
  8. AI Deanonymization: How Claude Identifies Writers from 125 Words

    2 MAY

    AI Deanonymization: How Claude Identifies Writers from 125 Words

    A journalist named Kelsey Piper handed Claude Opus 4.7 a 125-word draft of a political column she had never published. Incognito mode. No login. Through the API. She asked: who wrote this? Claude identified her. ChatGPT guessed Matthew Yglesias. Gemini guessed Scott Alexander. Both wrong. Then four more tests across genres and decades — a Pokémon school report, a 1942 movie review, a 500-word heist novel, a college essay from 15 years ago. Claude went 5 for 5. Same writer recoverable from prose nobody had ever published. This episode is what the threshold drop is. In 1964, the canonical stylometric study — Mosteller and Wallace on the Federalist Papers — needed about 1,500 words per essay and a closed list of two candidates. In 2013, identifying J.K. Rowling as Robert Galbraith required an entire 80,000-word novel and a list of four candidates. In 2026, a frontier language model needs 125 words and the open set of every public writer on the internet. The text required dropped about a hundred-fold. The candidate pool expanded by a factor of millions. It covers the mechanism: how classical stylometry — Burrows' Delta counting commas and function words — became latent-vector matching inside a transformer. Why Claude specifically when ChatGPT and Gemini didn't match its accuracy. The Huang et al. EMNLP 2024 anchor: 84% accuracy at 60 words on a 10-author benchmark, with 2024 GPT-4 numbers. It covers Anthropic's own research on chain-of-thought faithfulness. Their April 2025 paper found Claude 3.7 Sonnet's reasoning chains acknowledge planted hints only about 25% of the time. The other 75%, the chain reasons through alternative arguments. Larger, more capable models produce less faithful reasoning, not more. Apply that to deanonymization: Claude identifies the writer correctly, then generates a plausible reason. Sub-symbolic identification. Symbolic confabulation. It covers the institutional fallout. Anthropic's December 2025 release of 1,250 anonymized interview transcripts — deanonymized 25% in roughly one day. Snowden's 2013 stylometric hedge. Reality Winner. Glassdoor reviewers under a threat that doesn't require a subpoena. Talley v. California and McIntyre v. Ohio — anonymous-speech doctrine that protects against government compulsion but not private inference. And the 15-year fingerprint persistence. Five predictions with horizons. Closing thesis: anonymity, which used to be the default state of writing, is now a capability deficit. CHAPTERS 00:00 Cold open — Piper × Claude Opus 4.7 01:56 Intro + preview 03:19 History — 1964 / 1996 / 2013 05:40 Mechanism — function words to latent vectors 08:30 Why Claude specifically 09:54 Right ID, wrong reasoning — Anthropic faithfulness 13:24 Implications — Anthropic dataset, Snowden, Glassdoor, First Amendment 18:32 15-year fingerprint persistence 20:12 Five predictions 22:37 Closing thesis SOURCES Apr 2026 — Kelsey Piper, The Argument: "I can never talk to an AI anonymously again" Apr 2025 — Anthropic: Reasoning Models Don't Always Say What They Think (CoT faithfulness) Mar 2025 — Anthropic: On the Biology of a Large Language Model (Lindsey et al.) 2024 — Huang, Chen, Shu (EMNLP): Can Large Language Models Identify Authorship? Feb 2026 — Tianshi Li (Northeastern Khoury): deanonymizing the Anthropic Interviewer dataset Dec 2025 — Anthropic: anonymized interview transcript release (~1,250 transcripts) 2013 — Patrick Juola (Duquesne): Galbraith / Rowling identification 1996 — FBI / James Fitzgerald: Unabomber stylometric attribution 1964 — Mosteller and Wallace: The Federalist Papers Bayesian authorship study 2013 — Snowden: stylometry hedge in initial Greenwald/Poitras contact 2017 — Reality Winner: NSA leak, printer-microdot identification 2020 — Kraken / Glassdoor: defamation suits against anonymous reviewers (EFF) 1995 — McIntyre v. Ohio Elections Commission (anonymous speech, Justice Stevens) 1960 — Talley v. California (handbill identification ordinance struck)

    24 min

About

How LLM inference actually works. Why the Strait of Hormuz could move oil prices 40 percent. What happens when AI starts automating AI research. Each episode picks one topic — usually tech, AI, or geopolitics — and goes deep. 30+ primary sources, every claim confidence-tagged, ~18 minutes per topic. For listeners tired of takes without numbers. Also on YouTube: youtube.com/@DeepDiveAIShow