Semi Doped

Vikram Sekar and Austin Lyons

The business and technology of semiconductors. Alpha for engineers and investors alike.

  1. VOR 4 TAGEN

    Cerebras IPO

    Cerebras IPO is the only thing to talk about this week. 🔥 IPO prices at $185/share. Pops nearly 70% right after. The first wafer-scale chip company to make it public — after a 40-year curse killed every prior attempt. A water-cooler-style convo on what Cerebras actually builds, why a 23 kW wafer is a power and cooling nightmare, why 44 GB of SRAM is both the magic and the wall for LLM inference, and the cursed Trilogy Systems saga that Gene Amdahl tried — and failed — to pull off in 1983. Why does Cerebras leave the whole wafer intact instead of dicing it? How do they route around defects to harvest ~900K working cores out of ~1M? Why is power delivery vertical, and why does the wafer literally expand a tenth of a millimeter when it heats up? What does the OpenAI deal actually buy — wafers, or tokens? And why does that distinction matter? Chapters:  0:00 Cold open: 23 kW per wafer  0:15 Cerebras IPO day at $185  2:39 What's a wafer-scale engine  10:30 Power, cooling, and thermal expansion  18:12 The 44 GB wall  26:35 The Trilogy Systems curse  32:11 Supercomputing → training → inference  39:36 The OpenAI deal and the Wild West Relevant reading:  Vik's Substack post on the Cerebras IPO and OpenAI deal: https://www.viksnewsletter.com/ Follow Chipstrat:  Newsletter: https://www.chipstrat.com X: https://x.com/austinsemis Follow Vik:  Newsletter: https://www.viksnewsletter.com/ X: https://x.com/vikramskr Follow Semi Doped:  Get more of Austin and Vik daily, free!  Sign up: https://www.semidoped.com/

    51 Min.
  2. 12. MAI

    Gimlet's Cross-Vendor Inference Cloud

    Gimlet Labs runs an inference cloud built on heterogeneous silicon. Their software traces a PyTorch workload, segments it into its component parts, and schedules each piece onto the best-suited hardware — connecting chips from different vendors on a single high-speed fabric. In this interview, Gimlet co-founder Natalie Serrino and former Intel executive Beltir walk through the architecture (graph trace, optimal split points, lowering each segment to TensorRT on NVIDIA and equivalents elsewhere), the three customer segments they sell into (frontier labs, sovereign clouds, AI natives), and a concrete demo: on GPT-OSS 120B at 8K input / 1K output, running the speculative decoder on a d-Matrix Corsair card while NVIDIA B200s handle the verifier shifts the throughput-vs-interactivity Pareto frontier roughly 4× over GPU-only speculative decode. The most surprising takeaway: most Neoclouds gave significant equity to a single silicon vendor in exchange for capacity. Hardware amortization is around 70% of their annual costs, and the equity terms prevent them from diversifying their silicon. So the only software innovation they can ship is disaggregation on top of one vendor's stack — never across vendors. Gimlet's two-track model (deploying orchestration software inside customer data centers, plus running their own Neocloud built on mixed silicon) is the answer to that constraint. Read the full transcript on Chipstrat. Chapters: 0:00 Intro and the chips no one's connected before 0:33 Inference cloud for agents 1:02 From Intel to Gimlet 2:14 The case for heterogeneous inference 4:03 Disaggregating inference by resource profile 6:24 Tracing PyTorch into a schedulable graph 8:08 Connecting chips never connected before 10:52 CPUs as the agentic workhorse 12:01 Tool calls in the same data center as the LLM 13:21 Latency vs throughput on a shared fabric 14:57 Three customer buckets 15:54 Sovereigns: make an API call, not a porting project 19:37 "Cracked software is the platform" 22:24 Why merchant silicon vendors need partners 25:18 Hyperscalers outsourcing CapEx, not just kernels 28:49 AI natives: latency budgets, not just price 32:06 The d-Matrix partnership 33:31 The Pareto frontier chart 35:56 Speculative decode on Corsair: 4× shift 37:27 4× faster, or 3× more customers? 41:22 Why most Neoclouds can't follow this model 42:34 Gimlet's two-track business model 44:30 CoreWeave vs Together vs Gimlet 45:15 Series A and hiring Relevant reading: The Information on Gimlet helping OpenAI optimize for Cerebras: https://www.theinformation.com/newsletters/ai-agenda/startup-helping-openai-optimize-ai-cerebras-chips Sachin Katti and Zain Asgar coauthored research at Stanford: https://arxiv.org/abs/2507.19635 Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/chipstrat

    49 Min.
  3. 8. MAI

    Power as the Next Physics Wall for AI

    What's common to optics and power that ruins everything in the era of AI? Resistance. The same physics that drove interconnects to optics is now driving low-voltage power delivery up to 800V. Austin Lyons (Chipstrat) and Vik Sekar (Vik's Newsletter) unpack it using the Kyber rack as an example. At 600kW and 48V, you're pushing 12,500 amps through a single rack. Power loss scales with I². The math doesn't work. The fix is 800V — and the parts come straight from the EV traction inverter ecosystem (SiC, GaN, IGBTs). We cover the full grid-to-GPU power conversion chain (substation, utility room, PSU, intermediate bus converter, VRM), why vertical power delivery is the CPO equivalent for power, and why the power industry is a much wider open problem than optics or HBM. Plus the new topology fight: 800V → 48V (reuse the existing 48V infrastructure) vs 800V → 6V (skip 48V entirely, like TI and Navitas are pushing). We also touch Coherent's six-inch indium phosphide ramp at Järfälla, Sweden, and why margins are the real read-through next quarter. Relevant reading: Vik's Substack post on power: https://www.viksnewsletter.com/p/power-delivery-as-the-next-physics-wall Google TPU 8i / 8t blog (Boardfly deep dive): https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive Get more of Austin and Vik daily, free! Sign up here: https://www.semidoped.com/ Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/austinsemis Follow Vik: Newsletter: https://www.viksnewsletter.com/ X: https://x.com/vikramskr Chapters (00:00) Intro (01:41) Memory tax: inflation, not innovation (03:46) Boardfly: 16 hops to 7 (05:12) Coherent's six-inch indium phosphide ramp (12:15) Power is the next physics wall (15:08) Why 48V breaks at 600kW: 12,500 amps (23:05) 800V and vertical power delivery: CPO for power (30:34) Grid to GPU: every stage is a different supply chain (39:20) 800V → 48V or skip straight to 6V?

    42 Min.
  4. Masterclass on Google's TPU v8 Networking

    24. APR.

    Masterclass on Google's TPU v8 Networking

    Google's Cloud Next 2026 keynote? Fire. 🔥 The TPU is now two chips instead of one — 8t for training, 8i for inference — but more interestingly, it's two scale-up networking topologies too. Austin Lyons (Chipstrat) and Vik Sekar (Vik's Newsletter) walk through what actually changed, one day after the announcement. OCS? Yes. AECs? Yep. Copper? Yep. Optics? Yep. We cover Virgo (Google's 47 petabit/second scale-out fabric, built entirely on OCS), Boardfly (the new scale-up topology for MoE inference that cuts hop count from 16 to 7), and the 3D torus Google still uses for training. Why is optical circuit switching the substrate of Google's data center? Why do active electrical cables still carry scale-up traffic inside racks? Why did Google split the CPU layer too, with custom ARM Axion head nodes to keep the TPUs fed? Along the way we trace the Dragonfly topology lineage to a 2008 paper by John Kim, Bill Dally, Steve Scott, and Dennis Abts. Abts went on to build Groq's rack-scale interconnect before landing at Nvidia. Chapters:  0:00 Intro  0:21 Two TPUs for two workloads  2:31 HBM, SRAM, and Axion CPUs  7:22 Why networking is the new bottleneck  17:14 Virgo: rebuilding scale-out on optics  25:24 3D torus Rubik's Cube scale-up for training  34:50 Boardfly: scale-up for MoE inference  42:07 Workload-specific everything Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/austinsemis Follow Vik: Newsletter: https://www.viksnewsletter.com/ X: https://x.com/vikramskr

    47 Min.
  5. Meta VP Matt Steiner on Ads Infra, GPUs, MTIA, and LLM-Written Kernels

    20. APR.

    Meta VP Matt Steiner on Ads Infra, GPUs, MTIA, and LLM-Written Kernels

    Matt Steiner, VP of Monetization Infrastructure, Ranking & AI Foundations at Meta, walks through how Meta's ad system actually works, and why the infrastructure behind it differs from what you'd build for LLMs. We cover Andromeda (retrieval on a custom NVIDIA Grace Hopper SKU Meta co-designed), Lattice (consolidating N ranking models into one), GEM (Meta's Generative Ads Recommendation foundation model), and the adaptive ranking model, a roughly one-trillion-parameter recommender served at sub-second latency. We get into why recommender workloads aren't embarrassingly parallel like LLMs (the "personalization blob"), what that means for Meta's MTIA custom silicon roadmap, and how LLM-written kernels (KernelEvolve) flipped the economics of running a heterogeneous hardware fleet. Demand for software engineering has actually gone up as the price has come down. Meta now wants ~100x more optimized kernels per chip. Read the full transcript at https://www.chipstrat.com/p/an-interview-with-meta-vp-matt-steiner Chapters: 0:00 Intro and scale 0:39 How Meta's ad system works 2:00 Meta Andromeda and the custom NVIDIA SKU 3:30 Lattice: consolidating ranking models 5:00 GEM, Meta's ads foundation model 6:30 Adaptive ranking for power users 8:17 The scale: 3B DAUs at sub-second latency 9:40 Why longer interaction histories matter 10:45 The anniversary gift analogy 12:57 A decade of compute evolution 15:21 Meta's infra as a CP-SAT problem 16:07 Co-designing Grace Hopper with NVIDIA 17:47 Matching compute shape to workload 18:26 Influencing hardware and software roadmaps 20:23 MTIA: why ads aren't LLMs 22:07 The personalization blob and I/O ratios 26:38 One trillion parameters at sub-second latency 28:26 Heterogeneous hardware trade-offs 29:30 KernelEvolve: LLMs writing custom kernels 33:30 GenAI and recommender systems cross-pollination 35:21 The 2-year infrastructure outlook 37:00 Why demand for software engineering is rising 38:53 How Matt stays on top of it all Relevant reading: KernelEvolve (Meta Engineering): https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/ Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/chipstrat

    40 Min.

Info

The business and technology of semiconductors. Alpha for engineers and investors alike.

Das gefällt dir vielleicht auch