Semi Doped

Vikram Sekar and Austin Lyons

The business and technology of semiconductors. Alpha for engineers and investors alike.

Episodios

  1. A New Era of Context Memory with Val Bercovici from WEKA

    HACE 42 MIN

    A New Era of Context Memory with Val Bercovici from WEKA

    Vik and Val Bercovici discuss the evolution of storage solutions in the context of AI, focusing on Weka's innovative approaches to context memory, high bandwidth flash, and the importance of optimizing GPU usage. Val shares insights from his extensive experience in the storage industry, highlighting the challenges and advancements in memory requirements for AI models, the significance of latency, and the future of storage technologies. Takeaways Context memory is crucial for AI performance.The demand for memory has drastically increased.Latency issues can hinder AI efficiency.High bandwidth flash offers new storage capabilities.Weka's Axon software enhances GPU storage utilization.Token warehouses can significantly reduce costs.Augmented memory grids improve memory access speeds.Networking innovations are essential for AI storage solutions.Understanding memory hierarchies is vital for optimization.The future of storage will involve more advanced technologies.Chapters 00:00 Introduction to Weka and AI Storage Solutions 05:18 The Evolution of Context Memory in AI 09:30 Understanding Memory Hierarchies and Their Impact 16:24 Latency Challenges in Modern Storage Solutions 21:32 The Role of Networking in AI Storage Efficiency 29:42 Dynamic Resource Utilization in AI Networks 30:04 Introducing the Context Memory Network 31:13 High Bandwidth Flash: A Game Changer 32:54 Weka's Neural Mesh and Storage Solutions 35:01 Axon: Transforming GPU Storage into Memory 39:00 Augmented Memory Grid Explained 42:00 Pooling DRAM and CXL Innovations 46:02 Token Warehouses and Inference Economics 52:10 The Future of Storage Innovations Resources Manus AI $2B Blog: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus Also listen to this podcast on your favorite platform. https://www.semidoped.fm/ Check out Vik's Substack: https://www.viksnewsletter.com/ Check out Austin's Substack: https://www.chipstrat.com/

    54 min
  2. OpenClaw Makes AI Agents and CPUs Get Real

    HACE 3 DÍAS

    OpenClaw Makes AI Agents and CPUs Get Real

    Austin and Vik discuss the emerging trend of AI agents, particularly focusing on Claude Code and OpenClaw, and the resulting hardware implications. Key Takeaways: 2026 is expected to be a pivotal year for AI agents.The rise of agentic AI is moving beyond marketing to practical applications.Claude Code is being used for more than just coding; it aids in research and organization.Integrating AI with tools like Google Drive enhances productivity.Security concerns arise with giving AI agents access to personal data.Local computing options for AI can reduce costs and increase control.AI agents can automate repetitive tasks, freeing up human time for creative work.The demand for CPUs is increasing due to the needs of AI agents.AI can help summarize and organize information but may lack deep insights.The future of AI will involve balancing automation with human oversight.Chapters (00:00) Introduction: Why 2026 may be the year of AI agents (01:12) What people mean by agents and the OpenClaw naming chaos (02:41) Agents behaving badly: crypto losses and social posting (03:38) Claude Code as a research tool, not a coding tool (05:54) Terminal-first workflows vs GUI-based agents (07:44) Connecting Claude Code to Gmail, Drive, and Calendar via MCP (09:12) Token waste, authentication friction, and workflow optimization (10:54) Automating newsletter ingestion and research archives (12:33) Giving agents login credentials and security tradeoffs (13:50) Filtering signal from noise with topic constraints (16:36) AI-driven idea generation and its limitations (17:34) When automation effort is not worth it (19:02) Are agents ready for non-technical users? (20:55) Why OpenClaw should not run on your personal laptop (21:33) Safe agent deployment: VPS vs local servers (23:33) The true cost of agents: infrastructure plus inference (24:18) What OpenClaw adds beyond Claude Code (26:53) Agents require managerial thinking and self-awareness (28:18) Local inference vs cloud APIs (30:46) Cost control with OpenRouter and model hierarchies (32:31) Scaling agents forces model and cost optimization (33:00) AI aggregation vs creator analytics (35:58) AI as discovery, not a replacement for reading (38:17) When summaries are enough and when they are not (39:47) Why AI cannot understand what is not said (41:18) Agentic AI is driving unexpected CPU demand (41:49) Intel caught off guard by CPU shortages (44:53) Security, identity, and encryption shift work to CPUs (46:10) Closing thoughts: agents are real, early, and uneven Deploy your secure OpenClaw instance with DigitalOcean: https://www.digitalocean.com/blog/moltbot-on-digitalocean Visit the podcast website: https://www.semidoped.fm Austin's Substack: https://www.chipstrat.com/ Vik's Substack: https://www.viksnewsletter.com/

    48 min
  3. An Interview with Microsoft's Saurabh Dighe About Maia 200

    28 ENE

    An Interview with Microsoft's Saurabh Dighe About Maia 200

    Maia 100 was a pre-GPT accelerator. Maia 200 is explicitly post-GPT for large multimodal inference. Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet. Intereting topics include: • What “30% better price-performance” actually means • Who Maia 200 is built for • Why Microsoft bet on inference when designing Maia back in 2022/2023 • Large SRAM + high-capacity HBM • Massive scale-up, no scale-out • On-die NIC integration Maia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale. Chapters: (00:00) Introduction (01:00) What Maia 200 is and who it’s for (02:45) Why custom silicon isn’t just a margin play (04:45) Inference as an efficient frontier (06:15) Portfolio thinking and heterogeneous infrastructure (09:00) Designing for LLMs and reasoning models (10:45) Why Maia avoids training workloads (12:00) Betting on inference in 2022–2023, before reasoning models (14:40) Hyperscaler advantage in custom silicon (16:00) Capacity allocation and internal customers (17:45) How third-party customers access Maia (18:30) Software, compilers, and time-to-value (22:30) Measuring success and the Maia 300 roadmap (28:30) What “30% better price-performance” actually means (32:00) Scale-up vs scale-out architecture (35:00) Ethernet and custom transport choices (37:30) On-die NIC integration (40:30) Memory hierarchy: SRAM, HBM, and locality (49:00) Long context and KV cache strategy (51:30) Wrap-up

    53 min
  4. Can Pre-GPT AI Accelerators Handle Long Context Workloads?

    26 ENE

    Can Pre-GPT AI Accelerators Handle Long Context Workloads?

    OpenAI's partnership with Cerebras and Nvidia's announcement of context memory storage raises a fundamental question: as agentic AI demands long sessions with massive context windows, can SRAM-based accelerators designed before the LLM era keep up—or will they converge with GPUs? Key Takeaways 1. Context is the new bottleneck. As agentic workloads demand long sessions with massive codebases, storing and retrieving KV cache efficiently becomes critical. 2. There's no one-size-fits-all. Sachin Khatti's (OpenAI, ex-Intel) signals a shift toward heterogeneous compute—matching specific accelerators to specific workloads. 3. Cerebras has 44GB of SRAM per wafer — orders of magnitude more than typical chips — but the question remains: where does the KV cache go for long context? 4. Pre-GPT accelerators may converge toward GPUs. If they need to add HBM or external memory for long context, some of their differentiation erodes. 5. Post-GPT accelerators (Etched, MatX) are the ones to watch. Designed specifically for transformer inference, they may solve the KV cache problem from first principles. Chapters   - 00:00 — Intro   - 01:20 — What is context memory storage?   - 03:30 — When Claude runs out of context   - 06:00 — Tokens, attention, and the KV cache explained   - 09:07 — The AI memory hierarchy: HBM → DRAM → SSD → network storage   - 12:53 — Nvidia's G1/G2/G3 tiers and the missing G0 (SRAM)   - 14:35 — Bluefield DPUs and GPU Direct Storage   - 15:53 — Token economics: cache hits vs misses   - 20:03 — OpenAI + Cerebras: 750 megawatts for faster Codex   - 21:29 — Why Cerebras built a wafer-scale engine   - 25:07 — 44GB SRAM and running Llama 70B on four wafers   - 25:55 — Sachin Khatti on heterogeneous compute strategy   - 31:43 — The big question: where does Cerebras store KV cache?   - 34:11 — If SRAM offloads to HBM, does it lose its edge?   - 35:40 — Pre-GPT vs Post-GPT accelerators   - 36:51 — Etched raises $500M at $5B valuation   - 38:48 — Wrap up

    38 min
  5. An Interview with Innoviz CEO Omer Keilaf about current LiDAR market dynamics

    22 ENE

    An Interview with Innoviz CEO Omer Keilaf about current LiDAR market dynamics

    Innoviz CEO Omer Keilaf believes the LIDAR market is down to its final players—and that Innoviz has already won its seat. In this conversation, we cover the Level 4 gold rush sparked by Waymo, why stalled Level 3 programs are suddenly accelerating, the technical moat that separates L4-grade LIDAR from everything else, how a one-year-old startup won BMW, and why Keilaf thinks his competitors are already out of the race. Omer Keilaf founded Innoviz in 2016. Today it's a publicly traded Tier 1 supplier to BMW, Volkswagen, Daimler Truck, and other global OEMs. Chapters   00:00 Introduction   00:17 Why Start a LIDAR Company in 2016?   01:32 The Personal Story Behind Innoviz   03:12 Transportation Is Still Our Biggest Daily Risk   04:28 The 2012 Spark: Xbox Kinect and 3D Sensing   06:32 From Mobile to Automotive: Finding the Right Platform   07:54 "I Didn't Know What LIDAR Was, But I'd Do It Better"   08:19 How a One-Year-Old Startup Won BMW   10:04 Surviving the First Product   11:23 From Tier 2 to Tier 1: The Volkswagen Win   13:47 Lessons Learned Scaling Through Partners   14:45 The SPAC Decision: A Wake-Up Call from a Competitor   16:42 From 200 LIDAR Companies to a Handful   17:27 NREs: How Tier 1 Status Funds R&D   18:44 Why Automotive-First Is the Right Strategy   19:45 Consolidation Patterns: Cameras, Radars, Airbags   20:31 "The Music Has Stopped"   21:07 Non-Automotive: Underserved Markets   23:51 Working with Secretive OEMs   25:27 The Press Release They Tried to Stop   26:42 CES 2025: 85% of Meetings Were Level 4   27:40 Why Level 3 Programs Are Suddenly Accelerating   28:33 The EV/ADAS Coupling Problem   29:49 Design Is Everything: The Holy Grail Is Behind the Windshield   31:13 The Three-Year RFQ: Grill → Roof → Windshield   32:32 Innoviz3: Small Enough for Behind-the-Windshield   34:40 Innoviz2 for L4, Innoviz3 for Consumer L3   36:38 What's the Real Difference Between L2, L3, and L4 LIDAR?   38:51 The Mud Test: Why L4 Demands 100% Availability   40:50 "We're the Only LIDAR Designed for Level 4"   42:52 Patents and the Maslow Pyramid of Autonomy   44:15 Non-Automotive Markets: Agriculture, Mining, Security   46:15 Closing

    47 min

Acerca de

The business and technology of semiconductors. Alpha for engineers and investors alike.

También te podría interesar